AI-Gateway
APIM β€οΈ OpenAI - this repo contains a set of experiments on using GenAI capabilities of Azure API Management with Azure OpenAI and other services
Stars: 155
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
README:
APIM β€οΈ OpenAI - π§ͺ Labs for the GenAI Gateway capabilities of Azure API Management
β the Model routing lab with OpenAI model based routing.
β the Prompt flow lab to try the Azure AI Studio Prompt Flow with Azure API Management.
β priority
and weight
parameters to the Backend pool load balancing lab.
β the Streaming tool to test OpenAI streaming with Azure API Management.
β the Tracing tool to debug and troubleshoot OpenAI APIs using Azure API Management tracing capability.
β image processing to the GPT-4o inferencing lab.
β the Function calling lab with a sample API on Azure Functions.
- π§ GenAI Gateway
- π§ͺ Labs
- π Getting started
- π¨ Tools
- ποΈ Well-Architected Framework
- π Show and tell
- π₯ Other Resources
The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.
AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI services.
With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.
This repo explores the AI Gateway pattern through a series of experimental labs. The GenAI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.
Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:
π§ͺ Backend pool load balancing (built-in) | π§ͺ Advanced load balancing (custom) |
Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers. | Playground to try the advanced load balancing (based on a custom Azure API Management policy) to either a list of Azure OpenAI endpoints or mock servers. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Access controlling | π§ͺ Token rate limiting |
Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client. | Playground to try the token rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Token metrics emitting | π§ͺ Semantic caching |
Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs. | Playground to try the sementic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Response streaming | π§ͺ Vector searching |
Playground to try response streaming with Azure API Management and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming. | Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Built-in logging | π§ͺ SLM self-hosting (phy-3) |
Playground to try the buil-in logging capabilities of Azure API Management. Logs requests into App Insights to track details and token usage. | Playground to try the self-hosted phy-3 Small Language Model (SLM) trough the Azure API Management self-hosted gateway with OpenAI API compatibility. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ GPT-4o inferencing | π§ͺ Message storing |
Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats. | Playground to test storing message details into Cosmos DB through the Log to event hub policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.). |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Developer tooling (WIP) | π§ͺ Function calling |
Playground to try the developer tooling available with Azure API Management to develop, debug, test and publish AI Service APIs. | Playground to try the OpenAI function calling feature with an Azure Functions API that is also managed by Azure API Management. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Model Routing | π§ͺ Prompt flow |
Playground to try routing to a backend based on Azure OpenAI model and version. | Playground to try the Azure AI Studio Prompt Flow with Azure API Management. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
- Assistants load balancing
- Logic Apps RAG
- Semantic Kernel plugin
- Content filtering
- PII handling
- Prompt guarding
- Llama inferencing
[!TIP] Kindly use the feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.
- Python 3.8 or later version installed
- VS Code installed with the Jupyter notebook extension enabled
- Azure CLI installed
- An Azure Subscription with Contributor permissions
- Access granted to Azure OpenAI or just enable the mock service
- Sign in to Azure with Azure CLI
- Clone this repo and configure your local machine with the prerequisites. Or just create a GitHub Codespace and run it on the browser or in VS Code.
- Navigate through the available labs and select one that best suits your needs. For starters we recommend the backend pool load balancing.
- Open the notebook and run the provided steps.
- Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate your submission of a pull request.
[!NOTE] πͺ² Please feel free to open a new issue if you find something that should be fixed or enhanced.
- AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. The app.py can be customized to tailor the Mock server to specific use cases.
- Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
- Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.
The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.
Lab | Security | Reliability | Performance | Operations | Costs |
---|---|---|---|---|---|
Request forwarding | β | ||||
Backend circuit breaking | β | β | |||
Backend pool load balancing | β | β | β | ||
Advanced load balancing | β | β | β | ||
Response streaming | β | β | |||
Vector searching | β | β | β | ||
Built-in logging | β | β | β | β | β |
SLM self-hosting | β | β |
[!TIP] Check the Azure Well-Architected Framework perspective on Azure OpenAI Service for aditional guidance.
[!TIP] Install the VS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code. Or just open the AI-GATEWAY.pptx for a plain old PowerPoint experience.
Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.
- AI Hub Gateway Landing Zone
- GenAI Gateway Guide
- Azure OpenAIΒ +Β APIM Sample
- AI+API better together: Benefits & Best Practices using APIs for AI workloads
- Designing and implementing a gateway solution with Azure OpenAI resources
- Azure OpenAI Using PTUs/TPMs With API Management - Using the Scaling Special Sauce
- Manage Azure OpenAI using APIM
- Setting up Azure OpenAI as a central capability with Azure API Management
- Introduction to Building AI Apps
We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.
[!IMPORTANT] This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AI-Gateway
Similar Open Source Tools
AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.
airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.
speakeasy
Speakeasy is a tool that helps developers create production-quality SDKs, Terraform providers, documentation, and more from OpenAPI specifications. It supports a wide range of languages, including Go, Python, TypeScript, Java, and C#, and provides features such as automatic maintenance, type safety, and fault tolerance. Speakeasy also integrates with popular package managers like npm, PyPI, Maven, and Terraform Registry for easy distribution.
redis-ai-resources
A curated repository of code recipes, demos, and resources for basic and advanced Redis use cases in the AI ecosystem. It includes demos for ArxivChatGuru, Redis VSS, Vertex AI & Redis, Agentic RAG, ArXiv Search, and Product Search. Recipes cover topics like Getting started with RAG, Semantic Cache, Advanced RAG, and Recommendation systems. The repository also provides integrations/tools like RedisVL, AWS Bedrock, LangChain Python, LangChain JS, LlamaIndex, Semantic Kernel, RelevanceAI, and DocArray. Additional content includes blog posts, talks, reviews, and documentation related to Vector Similarity Search, AI-Powered Document Search, Vector Databases, Real-Time Product Recommendations, and more. Benchmarks compare Redis against other Vector Databases and ANN benchmarks. Documentation includes QuickStart guides, official literature for Vector Similarity Search, Redis-py client library docs, Redis Stack documentation, and Redis client list.
findto
Findto is a decentralized search tool for the Web and AI that puts people in control of algorithms. It aims to provide a better search experience by offering diverse sources, privacy and carbon level information, trends exploration, autosuggest, voice search, and more. Findto encourages a free search experience and promotes a healthier internet by empowering users with democratic choices.
flowgen
FlowGen is a tool built for AutoGen, a great agent framework from Microsoft and a lot of contributors. It provides intuitive visual tools that streamline the construction and oversight of complex agent-based workflows, simplifying the process for creators and developers. Users can create Autoflows, chat with agents, and share flow templates. The tool is fully dockerized and supports deployment on Railway.app. Contributions to the project are welcome, and the platform uses semantic-release for versioning and releases.
inference
Xorbits Inference (Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models.
cambrian
Cambrian-1 is a fully open project focused on exploring multimodal Large Language Models (LLMs) with a vision-centric approach. It offers competitive performance across various benchmarks with models at different parameter levels. The project includes training configurations, model weights, instruction tuning data, and evaluation details. Users can interact with Cambrian-1 through a Gradio web interface for inference. The project is inspired by LLaVA and incorporates contributions from Vicuna, LLaMA, and Yi. Cambrian-1 is licensed under Apache 2.0 and utilizes datasets and checkpoints subject to their respective original licenses.
llm-graph-builder
Knowledge Graph Builder App is a tool designed to convert PDF documents into a structured knowledge graph stored in Neo4j. It utilizes OpenAI's GPT/Diffbot LLM to extract nodes, relationships, and properties from PDF text content. Users can upload files from local machine or S3 bucket, choose LLM model, and create a knowledge graph. The app integrates with Neo4j for easy visualization and querying of extracted information.
generative-ai-cdk-constructs
The AWS Generative AI Constructs Library is an open-source extension of the AWS Cloud Development Kit (AWS CDK) that provides multi-service, well-architected patterns for quickly defining solutions in code to create predictable and repeatable infrastructure, called constructs. The goal of AWS Generative AI CDK Constructs is to help developers build generative AI solutions using pattern-based definitions for their architecture. The patterns defined in AWS Generative AI CDK Constructs are high level, multi-service abstractions of AWS CDK constructs that have default configurations based on well-architected best practices. The library is organized into logical modules using object-oriented techniques to create each architectural pattern model.
Awesome-LLM
Awesome-LLM is a curated list of resources related to large language models, focusing on papers, projects, frameworks, tools, tutorials, courses, opinions, and other useful resources in the field. It covers trending LLM projects, milestone papers, other papers, open LLM projects, LLM training frameworks, LLM evaluation frameworks, tools for deploying LLM, prompting libraries & tools, tutorials, courses, books, and opinions. The repository provides a comprehensive overview of the latest advancements and resources in the field of large language models.
synmetrix
Synmetrix is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube.js to consolidate metrics from various sources and distribute them downstream via a SQL API. Use cases include data democratization, business intelligence and reporting, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.
auto-dev
AutoDev is an AI-powered coding wizard that supports multiple languages, including Java, Kotlin, JavaScript/TypeScript, Rust, Python, Golang, C/C++/OC, and more. It offers a range of features, including auto development mode, copilot mode, chat with AI, customization options, SDLC support, custom AI agent integration, and language features such as language support, extensions, and a DevIns language for AI agent development. AutoDev is designed to assist developers with tasks such as auto code generation, bug detection, code explanation, exception tracing, commit message generation, code review content generation, smart refactoring, Dockerfile generation, CI/CD config file generation, and custom shell/command generation. It also provides a built-in LLM fine-tune model and supports UnitEval for LLM result evaluation and UnitGen for code-LLM fine-tune data generation.
mlcraft
Synmetrix (prev. MLCraft) is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube (Cube.js) for flexible data models that consolidate metrics from various sources, enabling downstream distribution via a SQL API for integration into BI tools, reporting, dashboards, and data science. Use cases include data democratization, business intelligence, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.
GenAIExamples
This project provides a collective list of Generative AI (GenAI) and Retrieval-Augmented Generation (RAG) examples such as chatbot with question and answering (ChatQnA), code generation (CodeGen), document summary (DocSum), etc.
For similar tasks
AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.