
ai-gradio
A Python package that makes it easy for developers to create AI apps powered by various AI providers.
Stars: 1573

ai-gradio is a Python package that simplifies the creation of machine learning apps using various models like OpenAI, Google's Gemini, Anthropic's Claude, LumaAI, CrewAI, XAI's Grok, and Hyperbolic. It provides easy installation with support for different providers and offers features like text chat, voice chat, video chat, code generation interfaces, and AI agent teams. Users can set API keys for different providers and customize interfaces for specific tasks.
README:
A Python package that makes it easy for developers to create machine learning apps powered by various AI providers. Built on top of Gradio, it provides a unified interface for multiple AI models and services.
- Multi-Provider Support: Integrate with 15+ AI providers including OpenAI, Google Gemini, Anthropic, and more
- Text Chat: Interactive chat interfaces for all text models
- Voice Chat: Real-time voice interactions with OpenAI models
- Video Chat: Video processing capabilities with Gemini models
- Code Generation: Specialized interfaces for coding assistance
- Multi-Modal: Support for text, image, and video inputs
- Agent Teams: CrewAI integration for collaborative AI tasks
- Browser Automation: AI agents that can perform web-based tasks
- Computer-Use: AI agents that can control a virtual local macOS/Linux environment
Provider | Models |
---|---|
OpenAI | gpt-4-turbo, gpt-4, gpt-3.5-turbo |
Anthropic | claude-3-opus, claude-3-sonnet, claude-3-haiku |
Gemini | gemini-pro, gemini-pro-vision, gemini-2.0-flash-exp |
Groq | llama-3.2-70b-chat, mixtral-8x7b-chat |
Provider | Type | Models |
---|---|---|
LumaAI | Generation | dream-machine, photon-1 |
DeepSeek | Multi-purpose | deepseek-chat, deepseek-coder, deepseek-vision |
CrewAI | Agent Teams | Support Team, Article Team |
Qwen | Language | qwen-turbo, qwen-plus, qwen-max |
Browser | Automation | browser-use-agent |
Cua | Computer-Use | OpenAI Computer-Use Preview |
# Install core package
pip install ai-gradio
# Install with specific provider support
pip install 'ai-gradio[openai]' # OpenAI support
pip install 'ai-gradio[gemini]' # Google Gemini support
pip install 'ai-gradio[anthropic]' # Anthropic Claude support
pip install 'ai-gradio[groq]' # Groq support
# Install all providers
pip install 'ai-gradio[all]'
pip install 'ai-gradio[crewai]' # CrewAI support
pip install 'ai-gradio[lumaai]' # LumaAI support
pip install 'ai-gradio[xai]' # XAI/Grok support
pip install 'ai-gradio[cohere]' # Cohere support
pip install 'ai-gradio[sambanova]' # SambaNova support
pip install 'ai-gradio[hyperbolic]' # Hyperbolic support
pip install 'ai-gradio[deepseek]' # DeepSeek support
pip install 'ai-gradio[smolagents]' # SmolagentsAI support
pip install 'ai-gradio[fireworks]' # Fireworks support
pip install 'ai-gradio[together]' # Together support
pip install 'ai-gradio[qwen]' # Qwen support
pip install 'ai-gradio[browser]' # Browser support
pip install 'ai-gradio[cua]' # Computer-Use support
# Core Providers
export OPENAI_API_KEY=<your token>
export GEMINI_API_KEY=<your token>
export ANTHROPIC_API_KEY=<your token>
export GROQ_API_KEY=<your token>
export TAVILY_API_KEY=<your token> # Required for Langchain agents
# Additional Providers (as needed)
export LUMAAI_API_KEY=<your token>
export XAI_API_KEY=<your token>
export COHERE_API_KEY=<your token>
# ... (other provider keys)
# Twilio credentials (required for WebRTC voice chat)
export TWILIO_ACCOUNT_SID=<your Twilio account SID>
export TWILIO_AUTH_TOKEN=<your Twilio auth token>
import gradio as gr
import ai_gradio
# Create a simple chat interface
gr.load(
name='openai:gpt-4-turbo', # or 'gemini:gemini-1.5-flash', 'groq:llama-3.2-70b-chat'
src=ai_gradio.registry,
title='AI Chat',
description='Chat with an AI model'
).launch()
# Create a chat interface with Transformers models
gr.load(
name='transformers:phi-4', # or 'transformers:tulu-3', 'transformers:olmo-2-13b'
src=ai_gradio.registry,
title='Local AI Chat',
description='Chat with locally running models'
).launch()
# Create a coding assistant with OpenAI
gr.load(
name='openai:gpt-4-turbo',
src=ai_gradio.registry,
coder=True,
title='OpenAI Code Assistant',
description='OpenAI Code Generator'
).launch()
# Create a coding assistant with Gemini
gr.load(
name='gemini:gemini-2.0-flash-thinking-exp-1219', # or 'openai:gpt-4-turbo', 'anthropic:claude-3-opus'
src=ai_gradio.registry,
coder=True,
title='Gemini Code Generator',
).launch()
gr.load(
name='openai:gpt-4-turbo',
src=ai_gradio.registry,
enable_voice=True,
title='AI Voice Assistant'
).launch()
# Create a vision-enabled interface with camera support
gr.load(
name='gemini:gemini-2.0-flash-exp',
src=ai_gradio.registry,
camera=True,
).launch()
import gradio as gr
import ai_gradio
with gr.Blocks() as demo:
with gr.Tab("Text"):
gr.load('openai:gpt-4-turbo', src=ai_gradio.registry)
with gr.Tab("Vision"):
gr.load('gemini:gemini-pro-vision', src=ai_gradio.registry)
with gr.Tab("Code"):
gr.load('deepseek:deepseek-coder', src=ai_gradio.registry)
demo.launch()
# Article Creation Team
gr.load(
name='crewai:gpt-4-turbo',
src=ai_gradio.registry,
crew_type='article',
title='AI Writing Team'
).launch()
playwright install
use python 3.11+ for browser use
import gradio as gr
import ai_gradio
# Create a browser automation interface
gr.load(
name='browser:gpt-4-turbo',
src=ai_gradio.registry,
title='AI Browser Assistant',
description='Let AI help with web tasks'
).launch()
Example tasks:
- Flight searches on Google Flights
- Weather lookups
- Product price comparisons
- News searches
# Install Computer-Use Agent support
pip install 'ai-gradio[cua]'
# Install Lume daemon (macOS only)
sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
# Start the Lume daemon service (in a separate terminal)
lume serve
# Pull the pre-built macOS image
lume pull macos-sequoia-cua:latest --no-cache
Requires macOS with Apple Silicon (M1/M2/M3/M4) and macOS 14 (Sonoma) or newer.
import gradio as gr
import ai_gradio
from dotenv import load_dotenv
# Load API keys from .env file
load_dotenv()
# Create a computer-use automation interface with OpenAI
gr.load(
name='cua:gpt-4-turbo', # Format: 'cua:model_name'
src=ai_gradio.registry,
title='Computer-Use Agent',
description='AI that can control a virtual macOS environment'
).launch()
Example tasks:
- Create Python virtual environments and run data analysis scripts
- Open PDFs in Preview, add annotations, and save compressed versions
- Browse Safari and manage bookmarks
- Clone and build GitHub repositories
- Configure SSH keys and remote connections
- Create automation scripts and schedule them with cron
import gradio as gr
import ai_gradio
# Create a chat interface with Swarms
gr.load(
name='swarms:gpt-4-turbo', # or other OpenAI models
src=ai_gradio.registry,
agent_name="Stock-Analysis-Agent", # customize agent name
title='Swarms Chat',
description='Chat with an AI agent powered by Swarms'
).launch()
import gradio as gr
import ai_gradio
# Create a Langchain agent interface
gr.load(
name='langchain:gpt-4-turbo', # or other supported models
src=ai_gradio.registry,
title='Langchain Agent',
description='AI agent powered by Langchain'
).launch()
- Python 3.10+
- gradio >= 5.9.1
- Voice Chat: gradio-webrtc, numba==0.60.0, pydub, librosa
- Video Chat: opencv-python, Pillow
- Agent Teams: crewai>=0.1.0, langchain>=0.1.0
If you encounter 401 errors, verify your API keys:
import os
# Set API keys manually if needed
os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["GEMINI_API_KEY"] = "your-api-key"
If you see "no providers installed" errors:
# Install specific provider
pip install 'ai-gradio[provider_name]'
# Or install all providers
pip install 'ai-gradio[all]'
Contributions are welcome! Please feel free to submit a Pull Request.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ai-gradio
Similar Open Source Tools

ai-gradio
ai-gradio is a Python package that simplifies the creation of machine learning apps using various models like OpenAI, Google's Gemini, Anthropic's Claude, LumaAI, CrewAI, XAI's Grok, and Hyperbolic. It provides easy installation with support for different providers and offers features like text chat, voice chat, video chat, code generation interfaces, and AI agent teams. Users can set API keys for different providers and customize interfaces for specific tasks.

aio-scrapy
Aio-scrapy is an asyncio-based web crawling and web scraping framework inspired by Scrapy. It supports distributed crawling/scraping, implements compatibility with scrapyd, and provides options for using redis queue and rabbitmq queue. The framework is designed for fast extraction of structured data from websites. Aio-scrapy requires Python 3.9+ and is compatible with Linux, Windows, macOS, and BSD systems.

retro-aim-server
Retro AIM Server is an instant messaging server that revives AOL Instant Messenger clients from the 2000s. It supports Windows AIM client versions 5.0-5.9, away messages, buddy icons, buddy list, chat rooms, instant messaging, user profiles, blocking/visibility toggle/idle notification, and warning. The Management API provides functionality for administering the server, including listing users, creating users, changing passwords, and listing active sessions.

we-drawing
The 'we-drawing' repository is a project that generates AI images based on Bing Image DALL-E-3 using a daily Chinese ancient poem as a prompt. It automatically triggers GitHub Action, fetches poems from '今日诗词' API, and builds the website with Astro. Users can subscribe to daily poem images via RSS feed and join the '新生代程序员群' WeChat group for discussions on front-end, back-end development, and AI technology.

UMbreLLa
UMbreLLa is a tool designed for deploying Large Language Models (LLMs) for personal agents. It combines offloading, speculative decoding, and quantization to optimize single-user LLM deployment scenarios. With UMbreLLa, 70B-level models can achieve performance comparable to human reading speed on an RTX 4070Ti, delivering exceptional efficiency and responsiveness, especially for coding tasks. The tool supports deploying models on various GPUs and offers features like code completion and CLI/Gradio chatbots. Users can configure the LLM engine for optimal performance based on their hardware setup.

markdrop
Markdrop is a Python package that facilitates the conversion of PDFs to markdown format while extracting images and tables. It also generates descriptive text descriptions for extracted tables and images using various LLM clients. The tool offers additional functionalities such as PDF URL support, AI-powered image and table descriptions, interactive HTML output with downloadable Excel tables, customizable image resolution and UI elements, and a comprehensive logging system. Markdrop aims to simplify the process of handling PDF documents and enhancing their content with AI-generated descriptions.

LongLLaVA
LongLLaVA is a tool for scaling multi-modal LLMs to 1000 images efficiently via hybrid architecture. It includes stages for single-image alignment, instruction-tuning, and multi-image instruction-tuning, with evaluation through a command line interface and model inference. The tool aims to achieve GPT-4V level capabilities and beyond, providing reproducibility of results and benchmarks for efficiency and performance.

LLMVoX
LLMVoX is a lightweight 30M-parameter, LLM-agnostic, autoregressive streaming Text-to-Speech (TTS) system designed to convert text outputs from Large Language Models into high-fidelity streaming speech with low latency. It achieves significantly lower Word Error Rate compared to speech-enabled LLMs while operating at comparable latency and speech quality. Key features include being lightweight & fast with only 30M parameters, LLM-agnostic for easy integration with existing models, multi-queue streaming for continuous speech generation, and multilingual support for easy adaptation to new languages.

Q-Bench
Q-Bench is a benchmark for general-purpose foundation models on low-level vision, focusing on multi-modality LLMs performance. It includes three realms for low-level vision: perception, description, and assessment. The benchmark datasets LLVisionQA and LLDescribe are collected for perception and description tasks, with open submission-based evaluation. An abstract evaluation code is provided for assessment using public datasets. The tool can be used with the datasets API for single images and image pairs, allowing for automatic download and usage. Various tasks and evaluations are available for testing MLLMs on low-level vision tasks.

pyllms
PyLLMs is a minimal Python library designed to connect to various Language Model Models (LLMs) such as OpenAI, Anthropic, Google, AI21, Cohere, Aleph Alpha, and HuggingfaceHub. It provides a built-in model performance benchmark for fast prototyping and evaluating different models. Users can easily connect to top LLMs, get completions from multiple models simultaneously, and evaluate models on quality, speed, and cost. The library supports asynchronous completion, streaming from compatible models, and multi-model initialization for testing and comparison. Additionally, it offers features like passing chat history, system messages, counting tokens, and benchmarking models based on quality, speed, and cost.

xlings
Xlings is a developer tool for programming learning, development, and course building. It provides features such as software installation, one-click environment setup, project dependency management, and cross-platform language package management. Additionally, it offers real-time compilation and running, AI code suggestions, tutorial project creation, automatic code checking for practice, and demo examples collection.

serve
Jina-Serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. It provides native support for major ML frameworks and data types, high-performance service design with scaling and dynamic batching, LLM serving with streaming output, built-in Docker integration and Executor Hub, one-click deployment to Jina AI Cloud, and enterprise-ready features with Kubernetes and Docker Compose support. Users can create gRPC-based AI services, build pipelines, scale services locally with replicas, shards, and dynamic batching, deploy to the cloud using Kubernetes, Docker Compose, or JCloud, and enable token-by-token streaming for responsive LLM applications.

HiveChat
HiveChat is an AI chat application designed for small and medium teams. It supports various models such as DeepSeek, Open AI, Claude, and Gemini. The tool allows easy configuration by one administrator for the entire team to use different AI models. It supports features like email or Feishu login, LaTeX and Markdown rendering, DeepSeek mind map display, image understanding, AI agents, cloud data storage, and integration with multiple large model service providers. Users can engage in conversations by logging in, while administrators can configure AI service providers, manage users, and control account registration. The technology stack includes Next.js, Tailwindcss, Auth.js, PostgreSQL, Drizzle ORM, and Ant Design.

cool-admin-midway
Cool-admin (midway version) is a cool open-source backend permission management system that supports modular, plugin-based, rapid CRUD development. It facilitates the quick construction and iteration of backend management systems, deployable in various ways such as serverless, docker, and traditional servers. It features AI coding for generating APIs and frontend pages, flow orchestration for drag-and-drop functionality, modular and plugin-based design for clear and maintainable code. The tech stack includes Node.js, Midway.js, Koa.js, TypeScript for backend, and Vue.js, Element-Plus, JSX, Pinia, Vue Router for frontend. It offers friendly technology choices for both frontend and backend developers, with TypeScript syntax similar to Java and PHP for backend developers. The tool is suitable for those looking for a modern, efficient, and fast development experience.

metaso-free-api
Metaso AI Free service supports high-speed streaming output, secret tower AI super network search (full network or academic as well as concise, in-depth, research three modes), zero-configuration deployment, multi-token support. Fully compatible with ChatGPT interface. It also has seven other free APIs available for use. The tool provides various deployment options such as Docker, Docker-compose, Render, Vercel, and native deployment. Users can access the tool for chat completions and token live checks. Note: Reverse API is unstable, it is recommended to use the official Metaso AI website to avoid the risk of banning. This project is for research and learning purposes only, not for commercial use.

agentscope
AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.
For similar tasks

ai-gradio
ai-gradio is a Python package that simplifies the creation of machine learning apps using various models like OpenAI, Google's Gemini, Anthropic's Claude, LumaAI, CrewAI, XAI's Grok, and Hyperbolic. It provides easy installation with support for different providers and offers features like text chat, voice chat, video chat, code generation interfaces, and AI agent teams. Users can set API keys for different providers and customize interfaces for specific tasks.
For similar jobs

promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.