
AI-Gateway
APIM ❤️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more 🚀 . New workshop experience at https://aka.ms/ai-gateway/workshop
Stars: 722

The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
README:
🧪 AI Gateway labs
➕ AI Gateway workshop provides a comprehensive learning experience using the Azure Portal
➕ Refactor most of the labs to use the new LLM built-in logging that supports streaming completions.
➕ Realtime API (Audio and Text) with Azure OpenAI 🔥 experiments with the AOAI Realtime
➕ Realtime API (Audio and Text) with Azure OpenAI + MCP tools 🔥 experiments with the AOAI Realtime + MCP
➕ Model Context Protocol (MCP) ⚙️ experiments with the client authorization flow
➕ the FinOps Framework lab to manage AI budgets effectively 💰
➕ Agentic ✨ experiments with Model Context Protocol (MCP).
➕ Agentic ✨ experiments with OpenAI Agents SDK.
➕ Agentic ✨ experiments with AI Agent Service from Azure AI Foundry.
- 🧠 AI Gateway
- 🧪 Labs with AI Agents
- 🧪 Labs with the Inference API
- 🧪 Labs based on Azure OpenAI
- 🚀 Getting started
- 🔨 Supporting tools
- 🏛️ Well-Architected Framework
- 🥇 Other Resources
The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.
AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI models, data and tools.
With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.
This repo explores the AI Gateway pattern through a series of experimental labs. The AI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure AI Foundry models, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any third party model.
Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:
Playground to experiment the Model Context Protocol with the client authorization flow. In this flow, Azure API Management act both as an OAuth client connecting to the Microsoft Entra ID authorization server and as an OAuth authorization server for the MCP client (MCP inspector in this lab).
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to experiment the Model Context Protocol with Azure API Management to enable plug & play of tools to LLMs. Leverages the credential manager for managing OAuth 2.0 tokens to backend tools and client token validation to ensure end-to-end authentication and authorization.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the OpenAI Agents with Azure OpenAI models and API based tools controlled by Azure API Management.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Use this playground to explore the Azure AI Agent Service, leveraging Azure API Management to control multiple services, including Azure OpenAI models, Logic Apps Workflows, and OpenAPI-based APIs.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the OpenAI function calling feature with an Azure Functions API that is also managed by Azure API Management.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the Deepseek R1 model via the AI Model Inference from Azure AI Foundry. This lab uses the Azure AI Model Inference API and two APIM LLM policies: llm-token-limit and llm-emit-token-metric.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
🧪 SLM self-hosting (Phi-3)
Playground to try the self-hosted Phi-3 Small Language Model (SLM) through the Azure API Management self-hosted gateway with OpenAI API compatibility.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
This playground leverages the FinOps Framework and Azure API Management to control AI costs. It uses the token limit policy for each product and integrates Azure Monitor alerts with Logic Apps to automatically disable APIM subscriptions that exceed cost quotas.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
🧪 Backend pool load balancing - Available with Bicep and Terraform
Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the token rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the semantic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to create a combination of several policies in an iterative approach. We start with load balancing, then progressively add token emitting, rate limiting, and, eventually, semantic caching. Each of these sets of policies is derived from other labs in this repo.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try routing to a backend based on Azure OpenAI model and version.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the buil-in LLM logging capabilities of Azure API Management. Logs requests into Azure Monitor to track details and token usage.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to test storing message details into Cosmos DB through the LLM Logging to event hub.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
Playground to try the content safety policy. The policy enforces content safety checks on any LLM prompts by transmitting them to the Azure AI Content Safety service before sending to the backend LLM API.
🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook
This is a list of potential future labs to be developed.
- Logic Apps RAG
- PII handling
- Python 3.12 or later version installed
- VS Code installed with the Jupyter notebook extension enabled
-
Python environment with the requirements.txt or run
pip install -r requirements.txt
in your terminal - An Azure Subscription with Contributor + RBAC Administrator or Owner roles
- Azure CLI installed and Signed into your Azure subscription
- Clone this repo and configure your local machine with the prerequisites. Or just create a GitHub Codespace and run it on the browser or in VS Code.
- Navigate through the available labs and select one that best suits your needs. For starters we recommend the token rate limiting.
- Open the notebook and run the provided steps.
- Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate your submission of a pull request.
[!NOTE] 🪲 Please feel free to open a new issue if you find something that should be fixed or enhanced.
- Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
- Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.
- AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. The app.py can be customized to tailor the Mock server to specific use cases.
The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.
Lab | Security | Reliability | Performance | Operations | Costs |
---|---|---|---|---|---|
Access controlling | ⭐ | ||||
Backend pool load balancing | ⭐ | ⭐ | ⭐ | ||
Semantic caching | ⭐ | ⭐ | |||
Token rate limiting | ⭐ | ⭐ | |||
Built-in LLM logging | ⭐ | ||||
FinOps framework | ⭐ | ⭐ |
[!TIP] Check the Azure Well-Architected Framework perspective on Azure OpenAI Service for aditional guidance.
We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.
[!IMPORTANT] This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AI-Gateway
Similar Open Source Tools

AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.

Conversation-Knowledge-Mining-Solution-Accelerator
The Conversation Knowledge Mining Solution Accelerator enables customers to leverage intelligence to uncover insights, relationships, and patterns from conversational data. It empowers users to gain valuable knowledge and drive targeted business impact by utilizing Azure AI Foundry, Azure OpenAI, Microsoft Fabric, and Azure Search for topic modeling, key phrase extraction, speech-to-text transcription, and interactive chat experiences.

LiteRT
LiteRT is Google's open-source high-performance runtime for on-device AI, previously known as TensorFlow Lite. The repository is currently not intended for open-source development, but aims to evolve to allow direct building and contributions. LiteRT supports Python versions 3.9, 3.10, 3.11 on Linux and MacOS. It ensures compatibility with existing .tflite file extension and format, offering conversion tools and continued active development under the name LiteRT.

ai-platform-engineering
The AI Platform Engineering repository provides a collection of tools and resources for building and deploying AI models. It includes libraries for data preprocessing, model training, and model serving. The repository also contains example code and tutorials to help users get started with AI development. Whether you are a beginner or an experienced AI engineer, this repository offers valuable insights and best practices to streamline your AI projects.

kodit
Kodit is a Code Indexing MCP Server that connects AI coding assistants to external codebases, providing accurate and up-to-date code snippets. It improves AI-assisted coding by offering canonical examples, indexing local and public codebases, integrating with AI coding assistants, enabling keyword and semantic search, and supporting OpenAI-compatible or custom APIs/models. Kodit helps engineers working with AI-powered coding assistants by providing relevant examples to reduce errors and hallucinations.

magic
Magic is an open-source all-in-one AI productivity platform designed to help enterprises quickly build and deploy AI applications, aiming for a 100x increase in productivity. It consists of various AI products and infrastructure tools, such as Super Magic, Magic IM, Magic Flow, and more. Super Magic is a general-purpose AI Agent for complex task scenarios, while Magic Flow is a visual AI workflow orchestration system. Magic IM is an enterprise-grade AI Agent conversation system for internal knowledge management. Teamshare OS is a collaborative office platform integrating AI capabilities. The platform provides cloud services, enterprise solutions, and a self-hosted community edition for users to leverage its features.

cline-based-code-generator
HAI Code Generator is a cutting-edge tool designed to simplify and automate task execution while enhancing code generation workflows. Leveraging Specif AI, it streamlines processes like task execution, file identification, and code documentation through intelligent automation and AI-driven capabilities. Built on Cline's powerful foundation for AI-assisted development, HAI Code Generator boosts productivity and precision by automating task execution and integrating file management capabilities. It combines intelligent file indexing, context generation, and LLM-driven automation to minimize manual effort and ensure task accuracy. Perfect for developers and teams aiming to enhance their workflows.

blurr
Panda is a proactive, on-device AI agent for Android that autonomously understands natural language commands and operates your phone's UI to achieve them. It acts as a personal operator, handling complex, multi-step tasks across different applications. With intelligent UI automation, high-quality voice, and personalized local memory, Panda simplifies interactions with technology. Built on Kotlin, Panda's architecture includes Eyes & Hands for physical device connection, The Brain for reasoning, and The Agent for execution. The project is a proof-of-concept aiming to become an indispensable assistant.

rhesis
Rhesis is a comprehensive test management platform designed for Gen AI teams, offering tools to create, manage, and execute test cases for generative AI applications. It ensures the robustness, reliability, and compliance of AI systems through features like test set management, automated test generation, edge case discovery, compliance validation, integration capabilities, and performance tracking. The platform is open source, emphasizing community-driven development, transparency, extensible architecture, and democratizing AI safety. It includes components such as backend services, frontend applications, SDK for developers, worker services, chatbot applications, and Polyphemus for uncensored LLM service. Rhesis enables users to address challenges unique to testing generative AI applications, such as non-deterministic outputs, hallucinations, edge cases, ethical concerns, and compliance requirements.

llm-twin-course
The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.

awesome-limitless
A curated list of amazing projects and resources built with the Limitless AI Pendant API. It includes applications, CLI tools, data visualization tools, integrations with plugins and extensions, utilities for server conversion and data ingestion, SDKs and libraries for Go and TypeScript, learning resources, and official API documentation.

policy-synth
Policy Synth is a TypeScript class library that empowers better decision-making for governments and companies by integrating collective and artificial intelligence. It streamlines processes through multi-scale AI agent logic flows, robust APIs, and cutting-edge real-time AI-driven web applications. The tool supports organizations in generating, refining, and implementing smarter, data-informed strategies, fostering collaboration with AI to tackle complex challenges effectively.

CursorLens
Cursor Lens is an open-source tool that acts as a proxy between Cursor and various AI providers, logging interactions and providing detailed analytics to help developers optimize their use of AI in their coding workflow. It supports multiple AI providers, captures and logs all requests, provides visual analytics on AI usage, allows users to set up and switch between different AI configurations, offers real-time monitoring of AI interactions, tracks token usage, estimates costs based on token usage and model pricing. Built with Next.js, React, PostgreSQL, Prisma ORM, Vercel AI SDK, Tailwind CSS, and shadcn/ui components.

ai-chat-android
AI Chat Android demonstrates Google's Generative AI on Android with Firebase Realtime Database. It showcases Gemini API integration, Jetpack Compose UI elements, Android architecture components with Hilt, Kotlin Coroutines for background tasks, and Firebase Realtime Database integration for real-time events. The project follows Google's official architecture guidance with a modularized structure for reusability, parallel building, and decentralized focusing.

second-brain-ai-assistant-course
This open-source course teaches how to build an advanced RAG and LLM system using LLMOps and ML systems best practices. It helps you create an AI assistant that leverages your personal knowledge base to answer questions, summarize documents, and provide insights. The course covers topics such as LLM system architecture, pipeline orchestration, large-scale web crawling, model fine-tuning, and advanced RAG features. It is suitable for ML/AI engineers and data/software engineers & data scientists looking to level up to production AI systems. The course is free, with minimal costs for tools like OpenAI's API and Hugging Face's Dedicated Endpoints. Participants will build two separate Python applications for offline ML pipelines and online inference pipeline.

solana-ai-agents
JLB AI Agent is an innovative solution on the Solana blockchain that leverages artificial intelligence to automate complex tasks and enhance decision-making in the DeFi space. It offers real-time analytics, efficient operations, and seamless integration for both newcomers and experienced crypto enthusiasts. With features like autonomous trading, NFT management, DeFi insights, and comprehensive ecosystem integration, JLB empowers users with cutting-edge technology to navigate the dynamic landscape of blockchain.
For similar tasks

AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.