AI-Gateway

APIM ❤️ OpenAI - this repo contains a set of experiments on using GenAI capabilities of Azure API Management with Azure OpenAI and other services

Stars: 344

Visit

The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.

README:

APIM ❤️ OpenAI - 🧪 Labs for the GenAI Gateway capabilities of Azure API Management

What's new ✨

➕ the AI Foundry SDK lab. ➕ the Content filtering and Prompt shielding labs. ➕ the Model routing lab with OpenAI model based routing. ➕ the Prompt flow lab to try the Azure AI Studio Prompt Flow with Azure API Management. ➕ priority and weight parameters to the Backend pool load balancing lab. ➕ the Streaming tool to test OpenAI streaming with Azure API Management. ➕ the Tracing tool to debug and troubleshoot OpenAI APIs using Azure API Management tracing capability. ➕ image processing to the GPT-4o inferencing lab. ➕ the Function calling lab with a sample API on Azure Functions.

🧠 GenAI Gateway
🧪 Labs
🚀 Getting started
🔨 Tools
🏛️ Well-Architected Framework
🎒 Show and tell
🥇 Other Resources

The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.

AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI services.

With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.

🧠 GenAI Gateway

This repo explores the AI Gateway pattern through a series of experimental labs. The GenAI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.

🧪 Labs

Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:

Current Labs

These labs are currently recommended after which to model your workloads.

🧪 Backend pool load balancing (built-in)

Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Token rate limiting

Playground to try the token rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Token metrics emitting

Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Semantic caching

Playground to try the semantic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Access controlling

Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 GPT-4o inferencing

Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Function calling

Playground to try the OpenAI function calling feature with an Azure Functions API that is also managed by Azure API Management.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Model Routing

Playground to try routing to a backend based on Azure OpenAI model and version.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Response streaming

Playground to try response streaming with Azure API Management and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Vector searching

Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Built-in logging

Playground to try the buil-in logging capabilities of Azure API Management. Logs requests into App Insights to track details and token usage.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 SLM self-hosting (phy-3)

Playground to try the self-hosted phy-3 Small Language Model (SLM) through the Azure API Management self-hosted gateway with OpenAI API compatibility.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Message storing

Playground to test storing message details into Cosmos DB through the Log to event hub policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.).

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Prompt flow

Playground to try the Azure AI Studio Prompt Flow with Azure API Management.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Content Filtering

Playground to try integrating Azure API Management with Azure AI Content Safety to filter potentially offensive, risky, or undesirable content.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

🧪 Prompt Shielding

Playground to try Prompt Shields from Azure AI Content Safety service that analyzes LLM inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

Deprecated Labs

These labs are no longer applicable. If you have implemented logic from these labs, please consider updating.

🧪 Advanced load balancing (custom)

Playground to try the advanced load balancing (based on a custom Azure API Management policy) to either a list of Azure OpenAI endpoints or mock servers.

🦾 Bicep ➕ ⚙️ Policy ➕ 🧾 Notebook 🟰 💬

Backlog of Labs

This is a list of potential future labs to be developed.

Assistants load balancing
Logic Apps RAG
Semantic Kernel plugin
PII handling
Llama inferencing

[!TIP] Kindly use the feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.

🚀 Getting Started

Prerequisites

Python 3.9 or later version installed
VS Code installed with the Jupyter notebook extension enabled
Azure CLI installed
An Azure Subscription with Contributor permissions
Access granted to Azure OpenAI or just enable the mock service
Sign in to Azure with Azure CLI

Quickstart

Clone this repo and configure your local machine with the prerequisites. Or just create a GitHub Codespace and run it on the browser or in VS Code.
Navigate through the available labs and select one that best suits your needs. For starters we recommend the backend pool load balancing.
Open the notebook and run the provided steps.
Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate your submission of a pull request.

[!NOTE] 🪲 Please feel free to open a new issue if you find something that should be fixed or enhanced.

🔨 Tools

AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. The app.py can be customized to tailor the Mock server to specific use cases.
Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.

🏛️ Well-Architected Framework

The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.

Lab	Security	Reliability	Performance	Operations	Costs
Request forwarding	⭐
Backend circuit breaking	⭐	⭐
Backend pool load balancing	⭐	⭐	⭐
Advanced load balancing	⭐	⭐	⭐
Response streaming	⭐		⭐
Vector searching	⭐	⭐	⭐
Built-in logging	⭐	⭐	⭐	⭐	⭐
SLM self-hosting	⭐		⭐

[!TIP] Check the Azure Well-Architected Framework perspective on Azure OpenAI Service for aditional guidance.

🎒 Show and tell

[!TIP] Install the VS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code. Or just open the AI-GATEWAY.pptx for a plain old PowerPoint experience.

🥇 Other resources

Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.

We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.

🌐 WW GBB initiative

Disclaimer

[!IMPORTANT] This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

For Tasks:

Click tags to check more tools for each tasks

try request forwarding explore response streaming implement token rate limiting test semantic caching experiment with gpt-4o inferencing

For Jobs:

ai engineer machine learning engineer data scientist software developer cloud architect

Alternative AI tools for AI-Gateway

Similar Open Source Tools

AI-Gateway

github

: 344

Conversation-Knowledge-Mining-Solution-Accelerator

The Conversation Knowledge Mining Solution Accelerator enables customers to leverage intelligence to uncover insights, relationships, and patterns from conversational data. It empowers users to gain valuable knowledge and drive targeted business impact by utilizing Azure AI Foundry, Azure OpenAI, Microsoft Fabric, and Azure Search for topic modeling, key phrase extraction, speech-to-text transcription, and interactive chat experiences.

github

: 255

llm-twin-course

The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.

github

: 3.1k

second-brain-ai-assistant-course

This open-source course teaches how to build an advanced RAG and LLM system using LLMOps and ML systems best practices. It helps you create an AI assistant that leverages your personal knowledge base to answer questions, summarize documents, and provide insights. The course covers topics such as LLM system architecture, pipeline orchestration, large-scale web crawling, model fine-tuning, and advanced RAG features. It is suitable for ML/AI engineers and data/software engineers & data scientists looking to level up to production AI systems. The course is free, with minimal costs for tools like OpenAI's API and Hugging Face's Dedicated Endpoints. Participants will build two separate Python applications for offline ML pipelines and online inference pipeline.

github

: 539

rag-time

RAG Time is a 5-week AI learning series focusing on Retrieval-Augmented Generation (RAG) concepts. The repository contains code samples, step-by-step guides, and resources to help users master RAG. It aims to teach foundational and advanced RAG concepts, demonstrate real-world applications, and provide hands-on samples for practical implementation.

github

: 91

OpenContracts

OpenContracts is an Apache-2 licensed enterprise document analytics tool that supports multiple formats, including PDF and txt-based formats. It features multiple document ingestion pipelines with a pluggable architecture for easy format and ingestion engine support. Users can create custom document analytics tools with beautiful result displays, support mass document data extraction with a LlamaIndex wrapper, and manage document collections, layout parsing, automatic vector embeddings, and human annotation. The tool also offers pluggable parsing pipelines, human annotation interface, LlamaIndex integration, data extraction capabilities, and custom data extract pipelines for bulk document querying.

github

: 803

duix.ai

Duix is a silicon-based digital human SDK for intelligent interaction, providing users with instant virtual human interaction experience on devices like Android and iOS. The SDK offers intuitive effect display and supports user customization through open documentation. It is fully open-source, allowing developers to understand its workings, optimize, and innovate further.

github

: 4.2k

codegate

CodeGate is a local gateway that enhances the safety of AI coding assistants by ensuring AI-generated recommendations adhere to best practices, safeguarding code integrity, and protecting individual privacy. Developed by Stacklok, CodeGate allows users to confidently leverage AI in their development workflow without compromising security or productivity. It works seamlessly with coding assistants, providing real-time security analysis of AI suggestions. CodeGate is designed with privacy at its core, keeping all data on the user's machine and offering complete control over data.

github

: 602

CursorLens

Cursor Lens is an open-source tool that acts as a proxy between Cursor and various AI providers, logging interactions and providing detailed analytics to help developers optimize their use of AI in their coding workflow. It supports multiple AI providers, captures and logs all requests, provides visual analytics on AI usage, allows users to set up and switch between different AI configurations, offers real-time monitoring of AI interactions, tracks token usage, estimates costs based on token usage and model pricing. Built with Next.js, React, PostgreSQL, Prisma ORM, Vercel AI SDK, Tailwind CSS, and shadcn/ui components.

github

: 73

vllm-ascend

vLLM Ascend plugin is a backend plugin designed to run vLLM on the Ascend NPU. It provides a hardware-pluggable interface that allows popular open-source models to run seamlessly on the Ascend NPU. The plugin is recommended within the vLLM community and adheres to the principles of hardware pluggability outlined in the RFC. Users can set up their environment with specific hardware and software prerequisites to utilize this plugin effectively.

github

: 410

extensionOS

Extension | OS is an open-source browser extension that brings AI directly to users' web browsers, allowing them to access powerful models like LLMs seamlessly. Users can create prompts, fix grammar, and access intelligent assistance without switching tabs. The extension aims to revolutionize online information interaction by integrating AI into everyday browsing experiences. It offers features like Prompt Factory for tailored prompts, seamless LLM model access, secure API key storage, and a Mixture of Agents feature. The extension was developed to empower users to unleash their creativity with custom prompts and enhance their browsing experience with intelligent assistance.

github

: 73

Easy-Voice-Toolkit

Easy Voice Toolkit is a toolkit based on open source voice projects, providing automated audio tools including speech model training. Users can seamlessly integrate functions like audio processing, voice recognition, voice transcription, dataset creation, model training, and voice conversion to transform raw audio files into ideal speech models. The toolkit supports multiple languages and is currently only compatible with Windows systems. It acknowledges the contributions of various projects and offers local deployment options for both users and developers. Additionally, cloud deployment on Google Colab is available. The toolkit has been tested on Windows OS devices and includes a FAQ section and terms of use for academic exchange purposes.

github

: 641

pluto

Pluto is a development tool dedicated to helping developers **build cloud and AI applications more conveniently** , resolving issues such as the challenging deployment of AI applications and open-source models. Developers are able to write applications in familiar programming languages like **Python and TypeScript** , **directly defining and utilizing the cloud resources necessary for the application within their code base** , such as AWS SageMaker, DynamoDB, and more. Pluto automatically deduces the infrastructure resource needs of the app through **static program analysis** and proceeds to create these resources on the specified cloud platform, **simplifying the resources creation and application deployment process**.

github

: 90

workbench-example-hybrid-rag

This NVIDIA AI Workbench project is designed for developing a Retrieval Augmented Generation application with a customizable Gradio Chat app. It allows users to embed documents into a locally running vector database and run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or using microservices via NVIDIA Inference Microservices (NIMs). The project supports various models with different quantization options and provides tutorials for using different inference modes. Users can troubleshoot issues, customize the Gradio app, and access advanced tutorials for specific tasks.

github

: 106

ai-chat-android

AI Chat Android demonstrates Google's Generative AI on Android with Firebase Realtime Database. It showcases Gemini API integration, Jetpack Compose UI elements, Android architecture components with Hilt, Kotlin Coroutines for background tasks, and Firebase Realtime Database integration for real-time events. The project follows Google's official architecture guidance with a modularized structure for reusability, parallel building, and decentralized focusing.

github

: 88

GenerativeAIExamples

NVIDIA Generative AI Examples are state-of-the-art examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs. These examples showcase the capabilities of NVIDIA's Generative AI platform, which includes tools, frameworks, and models for building and deploying generative AI applications.

github

: 2.2k

For similar tasks

AI-Gateway

github

: 344

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

AI-Gateway

README:

APIM ❤️ OpenAI - 🧪 Labs for the GenAI Gateway capabilities of Azure API Management

What's new ✨

Contents

🧠 GenAI Gateway

🧪 Labs

Current Labs

🧪 Backend pool load balancing (built-in)

🧪 SLM self-hosting (phy-3)

Deprecated Labs

🧪 Advanced load balancing (custom)

Backlog of Labs

🚀 Getting Started

Prerequisites

Quickstart

🔨 Tools

🏛️ Well-Architected Framework

🎒 Show and tell

🥇 Other resources

🌐 WW GBB initiative

Disclaimer

For Tasks:

For Jobs:

Alternative AI tools for AI-Gateway

Similar Open Source Tools

AI-Gateway

Conversation-Knowledge-Mining-Solution-Accelerator

llm-twin-course

second-brain-ai-assistant-course

rag-time

OpenContracts

duix.ai

codegate

CursorLens

vllm-ascend

extensionOS

Easy-Voice-Toolkit

pluto

workbench-example-hybrid-rag

ai-chat-android

GenerativeAIExamples

For similar tasks

AI-Gateway

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape