AI-Gateway
APIM β€οΈ OpenAI - this repo contains a set of experiments on using GenAI capabilities of Azure API Management with Azure OpenAI and other services
Stars: 155
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
README:
APIM β€οΈ OpenAI - π§ͺ Labs for the GenAI Gateway capabilities of Azure API Management
β the Model routing lab with OpenAI model based routing.
β the Prompt flow lab to try the Azure AI Studio Prompt Flow with Azure API Management.
β priority
and weight
parameters to the Backend pool load balancing lab.
β the Streaming tool to test OpenAI streaming with Azure API Management.
β the Tracing tool to debug and troubleshoot OpenAI APIs using Azure API Management tracing capability.
β image processing to the GPT-4o inferencing lab.
β the Function calling lab with a sample API on Azure Functions.
- π§ GenAI Gateway
- π§ͺ Labs
- π Getting started
- π¨ Tools
- ποΈ Well-Architected Framework
- π Show and tell
- π₯ Other Resources
The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.
AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI services.
With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.
This repo explores the AI Gateway pattern through a series of experimental labs. The GenAI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.
Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:
π§ͺ Backend pool load balancing (built-in) | π§ͺ Advanced load balancing (custom) |
Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers. | Playground to try the advanced load balancing (based on a custom Azure API Management policy) to either a list of Azure OpenAI endpoints or mock servers. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Access controlling | π§ͺ Token rate limiting |
Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client. | Playground to try the token rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Token metrics emitting | π§ͺ Semantic caching |
Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs. | Playground to try the sementic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Response streaming | π§ͺ Vector searching |
Playground to try response streaming with Azure API Management and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming. | Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Built-in logging | π§ͺ SLM self-hosting (phy-3) |
Playground to try the buil-in logging capabilities of Azure API Management. Logs requests into App Insights to track details and token usage. | Playground to try the self-hosted phy-3 Small Language Model (SLM) trough the Azure API Management self-hosted gateway with OpenAI API compatibility. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ GPT-4o inferencing | π§ͺ Message storing |
Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats. | Playground to test storing message details into Cosmos DB through the Log to event hub policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.). |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Developer tooling (WIP) | π§ͺ Function calling |
Playground to try the developer tooling available with Azure API Management to develop, debug, test and publish AI Service APIs. | Playground to try the OpenAI function calling feature with an Azure Functions API that is also managed by Azure API Management. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
π§ͺ Model Routing | π§ͺ Prompt flow |
Playground to try routing to a backend based on Azure OpenAI model and version. | Playground to try the Azure AI Studio Prompt Flow with Azure API Management. |
π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ | π¦Ύ Bicep β βοΈ Policy β π§Ύ Notebook π° π¬ |
- Assistants load balancing
- Logic Apps RAG
- Semantic Kernel plugin
- Content filtering
- PII handling
- Prompt guarding
- Llama inferencing
[!TIP] Kindly use the feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.
- Python 3.8 or later version installed
- VS Code installed with the Jupyter notebook extension enabled
- Azure CLI installed
- An Azure Subscription with Contributor permissions
- Access granted to Azure OpenAI or just enable the mock service
- Sign in to Azure with Azure CLI
- Clone this repo and configure your local machine with the prerequisites. Or just create a GitHub Codespace and run it on the browser or in VS Code.
- Navigate through the available labs and select one that best suits your needs. For starters we recommend the backend pool load balancing.
- Open the notebook and run the provided steps.
- Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate your submission of a pull request.
[!NOTE] πͺ² Please feel free to open a new issue if you find something that should be fixed or enhanced.
- AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. The app.py can be customized to tailor the Mock server to specific use cases.
- Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
- Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.
The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.
Lab | Security | Reliability | Performance | Operations | Costs |
---|---|---|---|---|---|
Request forwarding | β | ||||
Backend circuit breaking | β | β | |||
Backend pool load balancing | β | β | β | ||
Advanced load balancing | β | β | β | ||
Response streaming | β | β | |||
Vector searching | β | β | β | ||
Built-in logging | β | β | β | β | β |
SLM self-hosting | β | β |
[!TIP] Check the Azure Well-Architected Framework perspective on Azure OpenAI Service for aditional guidance.
[!TIP] Install the VS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code. Or just open the AI-GATEWAY.pptx for a plain old PowerPoint experience.
Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.
- AI Hub Gateway Landing Zone
- GenAI Gateway Guide
- Azure OpenAIΒ +Β APIM Sample
- AI+API better together: Benefits & Best Practices using APIs for AI workloads
- Designing and implementing a gateway solution with Azure OpenAI resources
- Azure OpenAI Using PTUs/TPMs With API Management - Using the Scaling Special Sauce
- Manage Azure OpenAI using APIM
- Setting up Azure OpenAI as a central capability with Azure API Management
- Introduction to Building AI Apps
We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.
[!IMPORTANT] This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AI-Gateway
Similar Open Source Tools
AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
redis-ai-resources
A curated repository of code recipes, demos, and resources for basic and advanced Redis use cases in the AI ecosystem. It includes demos for ArxivChatGuru, Redis VSS, Vertex AI & Redis, Agentic RAG, ArXiv Search, and Product Search. Recipes cover topics like Getting started with RAG, Semantic Cache, Advanced RAG, and Recommendation systems. The repository also provides integrations/tools like RedisVL, AWS Bedrock, LangChain Python, LangChain JS, LlamaIndex, Semantic Kernel, RelevanceAI, and DocArray. Additional content includes blog posts, talks, reviews, and documentation related to Vector Similarity Search, AI-Powered Document Search, Vector Databases, Real-Time Product Recommendations, and more. Benchmarks compare Redis against other Vector Databases and ANN benchmarks. Documentation includes QuickStart guides, official literature for Vector Similarity Search, Redis-py client library docs, Redis Stack documentation, and Redis client list.
findto
Findto is a decentralized search tool for the Web and AI that puts people in control of algorithms. It aims to provide a better search experience by offering diverse sources, privacy and carbon level information, trends exploration, autosuggest, voice search, and more. Findto encourages a free search experience and promotes a healthier internet by empowering users with democratic choices.
SemanticFinder
SemanticFinder is a frontend-only live semantic search tool that calculates embeddings and cosine similarity client-side using transformers.js and SOTA embedding models from Huggingface. It allows users to search through large texts like books with pre-indexed examples, customize search parameters, and offers data privacy by keeping input text in the browser. The tool can be used for basic search tasks, analyzing texts for recurring themes, and has potential integrations with various applications like wikis, chat apps, and personal history search. It also provides options for building browser extensions and future ideas for further enhancements and integrations.
FFAIVideo
FFAIVideo is a lightweight node.js project that utilizes popular AI LLM to intelligently generate short videos. It supports multiple AI LLM models such as OpenAI, Moonshot, Azure, g4f, Google Gemini, etc. Users can input text to automatically synthesize exciting video content with subtitles, background music, and customizable settings. The project integrates Microsoft Edge's online text-to-speech service for voice options and uses Pexels website for video resources. Installation of FFmpeg is essential for smooth operation. Inspired by MoneyPrinterTurbo, MoneyPrinter, and MsEdgeTTS, FFAIVideo is designed for front-end developers with minimal dependencies and simple usage.
leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.
buffer-of-thought-llm
Buffer of Thoughts (BoT) is a thought-augmented reasoning framework designed to enhance the accuracy, efficiency, and robustness of large language models (LLMs). It introduces a meta-buffer to store high-level thought-templates distilled from problem-solving processes, enabling adaptive reasoning for efficient problem-solving. The framework includes a buffer-manager to dynamically update the meta-buffer, ensuring scalability and stability. BoT achieves significant performance improvements on reasoning-intensive tasks and demonstrates superior generalization ability and robustness while being cost-effective compared to other methods.
kubesphere
KubeSphere is a distributed operating system for cloud-native application management, using Kubernetes as its kernel. It provides a plug-and-play architecture, allowing third-party applications to be seamlessly integrated into its ecosystem. KubeSphere is also a multi-tenant container platform with full-stack automated IT operation and streamlined DevOps workflows. It provides developer-friendly wizard web UI, helping enterprises to build out a more robust and feature-rich platform, which includes most common functionalities needed for enterprise Kubernetes strategy.
auto-dev
AutoDev is an AI-powered coding wizard that supports multiple languages, including Java, Kotlin, JavaScript/TypeScript, Rust, Python, Golang, C/C++/OC, and more. It offers a range of features, including auto development mode, copilot mode, chat with AI, customization options, SDLC support, custom AI agent integration, and language features such as language support, extensions, and a DevIns language for AI agent development. AutoDev is designed to assist developers with tasks such as auto code generation, bug detection, code explanation, exception tracing, commit message generation, code review content generation, smart refactoring, Dockerfile generation, CI/CD config file generation, and custom shell/command generation. It also provides a built-in LLM fine-tune model and supports UnitEval for LLM result evaluation and UnitGen for code-LLM fine-tune data generation.
generative-ai-cdk-constructs
The AWS Generative AI Constructs Library is an open-source extension of the AWS Cloud Development Kit (AWS CDK) that provides multi-service, well-architected patterns for quickly defining solutions in code to create predictable and repeatable infrastructure, called constructs. The goal of AWS Generative AI CDK Constructs is to help developers build generative AI solutions using pattern-based definitions for their architecture. The patterns defined in AWS Generative AI CDK Constructs are high level, multi-service abstractions of AWS CDK constructs that have default configurations based on well-architected best practices. The library is organized into logical modules using object-oriented techniques to create each architectural pattern model.
langfuse
Langfuse is a powerful tool that helps you develop, monitor, and test your LLM applications. With Langfuse, you can: * **Develop:** Instrument your app and start ingesting traces to Langfuse, inspect and debug complex logs, and manage, version, and deploy prompts from within Langfuse. * **Monitor:** Track metrics (cost, latency, quality) and gain insights from dashboards & data exports, collect and calculate scores for your LLM completions, run model-based evaluations, collect user feedback, and manually score observations in Langfuse. * **Test:** Track and test app behaviour before deploying a new version, test expected in and output pairs and benchmark performance before deploying, and track versions and releases in your application. Langfuse is easy to get started with and offers a generous free tier. You can sign up for Langfuse Cloud or deploy Langfuse locally or on your own infrastructure. Langfuse also offers a variety of integrations to make it easy to connect to your LLM applications.
llm-datasets
LLM Datasets is a repository containing high-quality datasets, tools, and concepts for LLM fine-tuning. It provides datasets with characteristics like accuracy, diversity, and complexity to train large language models for various tasks. The repository includes datasets for general-purpose, math & logic, code, conversation & role-play, and agent & function calling domains. It also offers guidance on creating high-quality datasets through data deduplication, data quality assessment, data exploration, and data generation techniques.
taranis-ai
Taranis AI is an advanced Open-Source Intelligence (OSINT) tool that leverages Artificial Intelligence to revolutionize information gathering and situational analysis. It navigates through diverse data sources like websites to collect unstructured news articles, utilizing Natural Language Processing and Artificial Intelligence to enhance content quality. Analysts then refine these AI-augmented articles into structured reports that serve as the foundation for deliverables such as PDF files, which are ultimately published.
Cherry_LLM
Cherry Data Selection project introduces a self-guided methodology for LLMs to autonomously discern and select cherry samples from open-source datasets, minimizing manual curation and cost for instruction tuning. The project focuses on selecting impactful training samples ('cherry data') to enhance LLM instruction tuning by estimating instruction-following difficulty. The method involves phases like 'Learning from Brief Experience', 'Evaluating Based on Experience', and 'Retraining from Self-Guided Experience' to improve LLM performance.
awesome-MLSecOps
Awesome MLSecOps is a curated list of open-source tools, resources, and tutorials for MLSecOps (Machine Learning Security Operations). It includes a wide range of security tools and libraries for protecting machine learning models against adversarial attacks, as well as resources for AI security, data anonymization, model security, and more. The repository aims to provide a comprehensive collection of tools and information to help users secure their machine learning systems and infrastructure.
Streamline-Analyst
Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates tasks such as data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless. It aims to expedite the data analysis process, making it accessible to all, regardless of their expertise in data analysis. The tool is built to empower users to process data and achieve high-quality visualizations with unparalleled efficiency, and to execute high-performance modeling with the best strategies. Future enhancements include Natural Language Processing (NLP), neural networks, and object detection utilizing YOLO, broadening its capabilities to meet diverse data analysis needs.
For similar tasks
AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.