enterprise-azureai
Unleash the power of Azure AI to your application developers in a secure & manageable way with Azure API Management and Azure Developer CLI.
Stars: 75
Azure OpenAI Service is a central capability with Azure API Management, providing guidance and tools for organizations to implement Azure OpenAI in a production environment with an emphasis on cost control, secure access, and usage monitoring. It includes infrastructure-as-code templates, CI/CD pipelines, secure access management, usage monitoring, load balancing, streaming requests, and end-to-end samples like ChatApp and Azure Dashboards.
README:
page_type: sample languages:
- azdeveloper
- csharp
- nodejs
- bicep
- bash
- powershell
- dockerfile
- json
- xml products:
- azure-api-management
- azure-app-configuration
- azure-cache-redis
- azure-container-apps
- azure-container-registry
- azure-dns
- azure-log-analytics
- azure-monitor
- azure-policy
- azure-private-link
- dotnet
- azure-app-service
- azure-key-vault
- azure-cosmos-db
- azure-openai urlFragment: enterprise-azureai name: Azure OpenAI Service as a central capability with Azure API Management description: Unleash the power of Azure OpenAI in your company in a secure and manageable way with Azure API Management and Azure Developer CLI
Unleash the power of Azure OpenAI in your company in a secure & manageable way with Azure API Management and Azure Developer CLI (azd
).
This repository provides guidance and tools for organizations looking to implement Azure OpenAI in a production environment with an emphasis on cost control, secure access, and usage monitoring. The aim is to enable organizations to effectively manage expenses while ensuring that the consuming application or team is accountable for the costs incurred.
[!NOTE]
This repository uses an AI Proxy to load-balance & log the traffic between Azure API Management and Azure OpenAI Service. In May 2024 Microsoft announced new features in Azure API Management Policies related to integrating with Azure OpenAI Service that overlap with the AI Proxy. We recommend to use the new features in Azure API Management Policies for new deployments, but if you need to implement customizations or additional features in the proxy, the AI Proxy is still very relevant. Forazd
implementation guidance on the new features in Azure API Management Policies, see here.
- Infrastructure-as-code: Bicep templates for provisioning and deploying the resources.
- CI/CD pipeline: GitHub Actions and Azure DevOps Pipelines for continuous deployment of the resources to Azure.
- Secure Access Management: Best practices and configurations for managing secure access to Azure OpenAI Services.
- Usage Monitoring & Cost Control: Solutions for tracking the usage of Azure OpenAI Services to facilitate accurate cost allocation and team charge-back.
- Load Balance: Utilize & loadbalance the capacity of Azure OpenAI across regions or provisioned throughput (PTU)
- Streaming requests: Support for streaming requests to Azure OpenAI, for all features (e.g. additional logging and charge-back)
- End-to-end sample: Including Sample ChatApp, Azure Dashboards, content filters and policies
Read more: Architecture in detail
- Infrastructure-as-code (IaC) Bicep files under the
infra
folder that demonstrate how to provision resources and setup resource tagging for azd. - A dev container configuration file under the
.devcontainer
directory that installs infrastructure tooling by default. This can be readily used to create cloud-hosted developer environments such as GitHub Codespaces or a local environment via a VSCode DevContainer. - Continuous deployment workflows for CI providers such as GitHub Actions under the
.github
directory, and Azure Pipelines under the.azdo
directory that work for most use-cases. - The .NET 8.0 chargeback proxy application under the
src
folder. - The NodeJS Sample ChatApp application under the
src
folder.
- Azure Developer CLI
- Azure CLI
- .NET 8.0 SDK
- Docker Desktop
- Node.js v18.17 or higher
- jq required on Mac and Linux
azd init -t Azure/enterprise-azureai
If you already cloned this repository to your local machine or run from a Dev Container or GitHub Codespaces you can run the following command from the root folder.
azd init
It will prompt you to provide a name that will later be used in the name of the deployed resources. If you're not logged into Azure, it will also prompt you to first login.
azd auth login
This repository uses environment variables to configure the deployment, which can be used to enable optional features. You can set these variables with the azd env set
command. Learn more about all optional features here.
azd env set USE_REDIS_CACHE_APIM '<true-or-false>'
azd env set SECONDARY_OPENAI_LOCATION '<your-secondary-openai-location>'
In the azd template, we automatically set an environment variable for your current IP address. During deployment, this allows traffic from your local machine to the Azure Container Registry for deploying the containerized application.
[!NOTE]
To determine your IPv4 address, the service icanhazip.com is being used. To control the IPv4 addresss used directly (without the service), edit the MY_IP_ADDRESS field in the .azure<name>.env file. This file is created after azd init. Without a properly configured IP address, azd up will fail.
azd up
It will prompt you to login, pick a subscription, and provide a location (like "eastus"). We've added an extra conditional parameter to deploy the Sample ChatApp, for demo-ing purposes.
Read more: Sample ChatApp
Then it will provision the resources in your account and deploy the latest code.
[!NOTE]
Because Azure OpenAI isn't available in all regions, you might get an error when you deploy the resources. You can find more information about the availability of Azure OpenAI here.
For more details on the deployed services, see additional details below.
[!NOTE]
Sometimes the DNS zones for the private endpoints aren't created correctly / in time. If you get an error when you deploy the resources, you can try to deploy the resources again.
You can enable Azure Redis Cache to improve the performance of Azure API Management. To enable this feature, set the USE_REDIS_CACHE_APIM
environment variable to true
.
azd env set USE_REDIS_CACHE_APIM 'true'
[!NOTE] Deployment of Azure Redis Cache can take up to 30 minutes.
You can enable a secondary Azure OpenAI location to improve the availability of Azure OpenAI. To enable this feature, set the SECONDARY_OPENAI_LOCATION
environment variable to the location of your choice.
azd env set SECONDARY_OPENAI_LOCATION '<your-secondary-openai-location>'
This project includes a Github workflow and an Azure DevOps Pipeline for deploying the resources to Azure on every push to main. That workflow requires several Azure-related authentication secrets to be stored as Github action secrets. To set that up, run:
azd pipeline config
You can configure azd
to provision and deploy resources to your deployment environments using standard commands such as azd up
or azd provision
. When platform.type
is set to devcenter, all azd
remote environment state and provisioning uses dev center components. azd
uses one of the infrastructure templates defined in your dev center catalog for resource provisioning. In this configuration, the infra folder in your local templates isn’t used.
azd config set platform.type devcenter
The Sample ChatApp is a simple NodeJS application that uses the API Management endpoints, exposing Azure OpenAI Service, to test the deployment and see how the Azure OpenAI Service works. In the ChatApp you can configure which API Management Subscription you want to use and with which deployment model, creating an end-to-end experience.
The deployed resources include a Log Analytics workspace with an Application Insights based dashboard to measure metrics like server response time and failed requests. We also included some custom visuals in the dashboard to visualize the token usage per consumer of the Azure OpenAI Service.
To open that dashboard, run this command once you've deployed:
azd monitor --overview
To clean up all the resources you've created and purge the soft-deletes, simply run:
azd down --purge --force
The resource group and all the resources will be deleted and you'll be prompted if you want the soft-deletes to be purged.
A tests.http file with relevant tests you can perform is included, to check if your deployment is successful. You need the 2 subcription keys for Marketing and Finance, created in API Management in order to test the API. You can find more information about how to create subscription keys here.
After forking this repo, you can use this GitHub Action to enable CI/CD for your fork. Just adjust the README in your fork to point to your own GitHub repo.
GitHub Action | Status |
---|---|
azd Deploy |
The following section examines different concepts that help tie in application and infrastructure.
This repository illustrates how to integrate Azure OpenAI as a central capability within an organization using Azure API Management and Azure Container Apps. Azure OpenAI offers AI models for generating text, images, etc., trained on extensive data. Azure API Management facilitates secure and managed exposure of APIs to the external environment. Azure Container Apps allows running containerized applications in Azure without infrastructure management. The repository includes a .NET 8.0 proxy application to allocate Azure OpenAI Service costs to the consuming application, aiding in cost control. The proxy supports load balancing and horizontal scaling of Azure OpenAI instances. A chargeback report in the Azure Dashboard visualizes Azure OpenAI Service costs, making it a centralized capability within the organization.
We've used the Azure Developer CLI Bicep Starter template to create this repository. With azd
you can create a new repository with a fully functional CI/CD pipeline in minutes. You can find more information about azd
here.
One of the key points of azd
templates is that we can implement best practices together with our solution when it comes to security, network isolation, monitoring, etc. Users are free to define their own best practices for their dev teams & organization, so all deployments are followed by the same standards.
The best practices we've followed for this architecture are: Azure Integration Service Landingzone Accelerator and for Azure OpenAI we've used the blog post Azure OpenAI Landing Zone reference architecture. For the chargeback proxy we've used the setup from the Azure Container Apps Landingzone Accelerator.
When it comes to security, there are recommendations mentioned for securing your Azure API Management instance in the accelerators above. For example, with the use of Front Door or Application Gateway (see this repository), proving Layer 7 protection and WAF capabilities, and by implementing OAuth authentication on the API Management instance. How to implement OAuth authentication on API Management (see here repository).
We're also using Azure Monitor Private Link Scope. This allows us to define the boundaries of my monitoring network, and only allow traffic from within that network to my Log Analytics workspace. This is a great way to secure your monitoring network.
In order to provide an end-to-end experience and enabling user to demo from a GUI, we've included a Sample ChatApp. This is a simple NodeJS application based on the Azure Chat Solution Accelerator. It uses Azure Cosmos DB to store the chat messages and leverages Azure Key Vault to store the secrets used in the appliction.
Azure API Management is a fully managed service that enables customers to publish, secure, transform, maintain, and monitor APIs. It is a great way to expose your APIs to the outside world in a secure and manageable way.
Azure OpenAI is a service that provides AI models that are trained on a large amount of data. You can use these models to generate text, images, and more.
Managed identities allows you to secure communication between services. This is done without having the need for you to manage any credentials.
Azure Virtual Network allows you to create a private network in Azure. You can use this to secure communication between services.
Azure Private DNS Zone allows you to create a private DNS zone in Azure. You can use this to resolve hostnames in your private network.
Application Insights allows you to monitor your application. You can use this to monitor the performance of your application.
Log Analytics allows you to collect and analyze telemetry data from your application. You can use this to monitor the performance of your application.
Azure Monitor Private Link Scope allows you to define the boundaries of your monitoring network, and only allow traffic from within that network to your Log Analytics workspace. This is a great way to secure your monitoring network.
Azure Private Endpoint allows you to connect privately to a service powered by Azure Private Link. Private Endpoint uses a private IP address from your VNet, effectively bringing the service into your VNet.
Azure Container Apps allows you to run containerized applications in Azure without having to manage any infrastructure.
Azure Container Registry allows you to store and manage container images and artifacts in a private registry for all types of container deployments.
Azure Redis Cache allows you to use a secure open source Redis cache.
Azure Container Environment allows you to run containerized applications in Azure without having to manage any infrastructure.
Azure Cosmos DB allows you to use a fully managed NoSQL database for modern app development.
Azure Key Vault allows you to safeguard cryptographic keys and other secrets used by cloud apps and services.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for enterprise-azureai
Similar Open Source Tools
enterprise-azureai
Azure OpenAI Service is a central capability with Azure API Management, providing guidance and tools for organizations to implement Azure OpenAI in a production environment with an emphasis on cost control, secure access, and usage monitoring. It includes infrastructure-as-code templates, CI/CD pipelines, secure access management, usage monitoring, load balancing, streaming requests, and end-to-end samples like ChatApp and Azure Dashboards.
serverless-chat-langchainjs
This sample shows how to build a serverless chat experience with Retrieval-Augmented Generation using LangChain.js and Azure. The application is hosted on Azure Static Web Apps and Azure Functions, with Azure Cosmos DB for MongoDB vCore as the vector database. You can use it as a starting point for building more complex AI applications.
genai-for-marketing
This repository provides a deployment guide for utilizing Google Cloud's Generative AI tools in marketing scenarios. It includes step-by-step instructions, examples of crafting marketing materials, and supplementary Jupyter notebooks. The demos cover marketing insights, audience analysis, trendspotting, content search, content generation, and workspace integration. Users can access and visualize marketing data, analyze trends, improve search experience, and generate compelling content. The repository structure includes backend APIs, frontend code, sample notebooks, templates, and installation scripts.
vector-vein
VectorVein is a no-code AI workflow software inspired by LangChain and langflow, aiming to combine the powerful capabilities of large language models and enable users to achieve intelligent and automated daily workflows through simple drag-and-drop actions. Users can create powerful workflows without the need for programming, automating all tasks with ease. The software allows users to define inputs, outputs, and processing methods to create customized workflow processes for various tasks such as translation, mind mapping, summarizing web articles, and automatic categorization of customer reviews.
conversational-agent-langchain
This repository contains a Rest-Backend for a Conversational Agent that allows embedding documents, semantic search, QA based on documents, and document processing with Large Language Models. It uses Aleph Alpha and OpenAI Large Language Models to generate responses to user queries, includes a vector database, and provides a REST API built with FastAPI. The project also features semantic search, secret management for API keys, installation instructions, and development guidelines for both backend and frontend components.
aws-bedrock-with-rag-and-react
This solution provides a low-code ReactJS application to prototype and vet business use cases for GenAI using Retrieval Augmented Generation (RAG). It includes a backend Flask application that uses LangChain to provide PDF data as embeddings to a text-gen model via Amazon Bedrock and a vector database with FAISS or Kendra Index. The solution utilizes Amazon Bedrock as the only cost-generating AWS service.
azure-search-openai-demo
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access a GPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval. The repo includes sample data so it's ready to try end to end. In this sample application we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions and roles.
azure-search-openai-javascript
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval.
generative-ai-application-builder-on-aws
The Generative AI Application Builder on AWS (GAAB) is a solution that provides a web-based management dashboard for deploying customizable Generative AI (Gen AI) use cases. Users can experiment with and compare different combinations of Large Language Model (LLM) use cases, configure and optimize their use cases, and integrate them into their applications for production. The solution is targeted at novice to experienced users who want to experiment and productionize different Gen AI use cases. It uses LangChain open-source software to configure connections to Large Language Models (LLMs) for various use cases, with the ability to deploy chat use cases that allow querying over users' enterprise data in a chatbot-style User Interface (UI) and support custom end-user implementations through an API.
dataherald
Dataherald is a natural language-to-SQL engine built for enterprise-level question answering over structured data. It allows you to set up an API from your database that can answer questions in plain English. You can use Dataherald to: * Allow business users to get insights from the data warehouse without going through a data analyst * Enable Q+A from your production DBs inside your SaaS application * Create a ChatGPT plug-in from your proprietary data
FlowTest
FlowTestAI is the world’s first GenAI powered OpenSource Integrated Development Environment (IDE) designed for crafting, visualizing, and managing API-first workflows. It operates as a desktop app, interacting with the local file system, ensuring privacy and enabling collaboration via version control systems. The platform offers platform-specific binaries for macOS, with versions for Windows and Linux in development. It also features a CLI for running API workflows from the command line interface, facilitating automation and CI/CD processes.
n8n-docs
n8n is an extendable workflow automation tool that enables you to connect anything to everything. It is open-source and can be self-hosted or used as a service. n8n provides a visual interface for creating workflows, which can be used to automate tasks such as data integration, data transformation, and data analysis. n8n also includes a library of pre-built nodes that can be used to connect to a variety of applications and services. This makes it easy to create complex workflows without having to write any code.
raggenie
RAGGENIE is a low-code RAG builder tool designed to simplify the creation of conversational AI applications. It offers out-of-the-box plugins for connecting to various data sources and building conversational AI on top of them, including integration with pre-built agents for actions. The tool is open-source under the MIT license, with a current focus on making it easy to build RAG applications and future plans for maintenance, monitoring, and transitioning applications from pilots to production.
cluster-toolkit
Cluster Toolkit is an open-source software by Google Cloud for deploying AI/ML and HPC environments on Google Cloud. It allows easy deployment following best practices, with high customization and extensibility. The toolkit includes tutorials, examples, and documentation for various modules designed for AI/ML and HPC use cases.
serverless-pdf-chat
The serverless-pdf-chat repository contains a sample application that allows users to ask natural language questions of any PDF document they upload. It leverages serverless services like Amazon Bedrock, AWS Lambda, and Amazon DynamoDB to provide text generation and analysis capabilities. The application architecture involves uploading a PDF document to an S3 bucket, extracting metadata, converting text to vectors, and using a LangChain to search for information related to user prompts. The application is not intended for production use and serves as a demonstration and educational tool.
DAILA
DAILA is a unified interface for AI systems in decompilers, supporting various decompilers and AI systems. It allows users to utilize local and remote LLMs, like ChatGPT and Claude, and local models such as VarBERT. DAILA can be used as a decompiler plugin with GUI or as a scripting library. It also provides a Docker container for offline installations and supports tasks like summarizing functions and renaming variables in decompilation.
For similar tasks
enterprise-azureai
Azure OpenAI Service is a central capability with Azure API Management, providing guidance and tools for organizations to implement Azure OpenAI in a production environment with an emphasis on cost control, secure access, and usage monitoring. It includes infrastructure-as-code templates, CI/CD pipelines, secure access management, usage monitoring, load balancing, streaming requests, and end-to-end samples like ChatApp and Azure Dashboards.
aws-reference-architecture-pulumi
The Pinecone AWS Reference Architecture with Pulumi is a distributed system designed for vector-database-enabled semantic search over Postgres records. It serves as a starting point for specific use cases or as a learning resource. The architecture is permissively licensed and supported by Pinecone's open-source team, facilitating the setup of high-scale use cases for Pinecone's scalable vector database.
dream-team
Build your dream team with Autogen is a repository that leverages Microsoft Autogen 0.4, Azure OpenAI, and Streamlit to create an end-to-end multi-agent application. It provides an advanced multi-agent framework based on Magentic One, with features such as a friendly UI, single-line deployment, secure code execution, managed identities, and observability & debugging tools. Users can deploy Azure resources and the app with simple commands, work locally with virtual environments, install dependencies, update configurations, and run the application. The repository also offers resources for learning more about building applications with Autogen.
rig
Rig is a Rust library designed for building scalable, modular, and user-friendly applications powered by large language models (LLMs). It provides full support for LLM completion and embedding workflows, offers simple yet powerful abstractions for LLM providers like OpenAI and Cohere, as well as vector stores such as MongoDB and in-memory storage. With Rig, users can easily integrate LLMs into their applications with minimal boilerplate code.
instructor
Instructor is a popular Python library for managing structured outputs from large language models (LLMs). It offers a user-friendly API for validation, retries, and streaming responses. With support for various LLM providers and multiple languages, Instructor simplifies working with LLM outputs. The library includes features like response models, retry management, validation, streaming support, and flexible backends. It also provides hooks for logging and monitoring LLM interactions, and supports integration with Anthropic, Cohere, Gemini, Litellm, and Google AI models. Instructor facilitates tasks such as extracting user data from natural language, creating fine-tuned models, managing uploaded files, and monitoring usage of OpenAI models.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.