agents
A framework for building realtime voice AI agents 🤖🎙️📹
Stars: 9390
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.
README:
Looking for the JS/TS library? Check out AgentsJS
The Agent Framework is designed for building realtime, programmable participants that run on servers. Use it to create conversational, multi-modal voice agents that can see, hear, and understand.
- Flexible integrations: A comprehensive ecosystem to mix and match the right STT, LLM, TTS, and Realtime API to suit your use case.
- Integrated job scheduling: Built-in task scheduling and distribution with dispatch APIs to connect end users to agents.
- Extensive WebRTC clients: Build client applications using LiveKit's open-source SDK ecosystem, supporting all major platforms.
- Telephony integration: Works seamlessly with LiveKit's telephony stack, allowing your agent to make calls to or receive calls from phones.
- Exchange data with clients: Use RPCs and other Data APIs to seamlessly exchange data with clients.
- Semantic turn detection: Uses a transformer model to detect when a user is done with their turn, helps to reduce interruptions.
- MCP support: Native support for MCP. Integrate tools provided by MCP servers with one loc.
- Builtin test framework: Write tests and use judges to ensure your agent is performing as expected.
- Open-source: Fully open-source, allowing you to run the entire stack on your own servers, including LiveKit server, one of the most widely used WebRTC media servers.
To install the core Agents library, along with plugins for popular model providers:
pip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]~=1.0"Documentation on the framework and how to use it can be found here
- Agent: An LLM-based application with defined instructions.
- AgentSession: A container for agents that manages interactions with end users.
- entrypoint: The starting point for an interactive session, similar to a request handler in a web server.
- AgentServer: The main process that coordinates job scheduling and launches agents for user sessions.
from livekit.agents import (
Agent,
AgentServer,
AgentSession,
JobContext,
RunContext,
cli,
function_tool,
inference,
)
from livekit.plugins import silero
@function_tool
async def lookup_weather(
context: RunContext,
location: str,
):
"""Used to look up weather information."""
return {"weather": "sunny", "temperature": 70}
server = AgentServer()
@server.rtc_session()
async def entrypoint(ctx: JobContext):
session = AgentSession(
vad=silero.VAD.load(),
# any combination of STT, LLM, TTS, or realtime API can be used
# this example shows LiveKit Inference, a unified API to access different models via LiveKit Cloud
# to use model provider keys directly, replace with the following:
# from livekit.plugins import deepgram, openai, cartesia
# stt=deepgram.STT(model="nova-3"),
# llm=openai.LLM(model="gpt-4.1-mini"),
# tts=cartesia.TTS(model="sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
stt=inference.STT("deepgram/nova-3", language="multi"),
llm=inference.LLM("openai/gpt-4.1-mini"),
tts=inference.TTS("cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
)
agent = Agent(
instructions="You are a friendly voice assistant built by LiveKit.",
tools=[lookup_weather],
)
await session.start(agent=agent, room=ctx.room)
await session.generate_reply(instructions="greet the user and ask about their day")
if __name__ == "__main__":
cli.run_app(server)You'll need the following environment variables for this example:
- LIVEKIT_URL
- LIVEKIT_API_KEY
- LIVEKIT_API_SECRET
This code snippet is abbreviated. For the full example, see multi_agent.py
...
class IntroAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions=f"You are a story teller. Your goal is to gather a few pieces of information from the user to make the story personalized and engaging."
"Ask the user for their name and where they are from"
)
async def on_enter(self):
self.session.generate_reply(instructions="greet the user and gather information")
@function_tool
async def information_gathered(
self,
context: RunContext,
name: str,
location: str,
):
"""Called when the user has provided the information needed to make the story personalized and engaging.
Args:
name: The name of the user
location: The location of the user
"""
context.userdata.name = name
context.userdata.location = location
story_agent = StoryAgent(name, location)
return story_agent, "Let's start the story!"
class StoryAgent(Agent):
def __init__(self, name: str, location: str) -> None:
super().__init__(
instructions=f"You are a storyteller. Use the user's information in order to make the story personalized."
f"The user's name is {name}, from {location}"
# override the default model, switching to Realtime API from standard LLMs
llm=openai.realtime.RealtimeModel(voice="echo"),
chat_ctx=chat_ctx,
)
async def on_enter(self):
self.session.generate_reply()
@server.rtc_session()
async def entrypoint(ctx: JobContext):
userdata = StoryData()
session = AgentSession[StoryData](
vad=silero.VAD.load(),
stt="deepgram/nova-3",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
userdata=userdata,
)
await session.start(
agent=IntroAgent(),
room=ctx.room,
)
...Automated tests are essential for building reliable agents, especially with the non-deterministic behavior of LLMs. LiveKit Agents include native test integration to help you create dependable agents.
@pytest.mark.asyncio
async def test_no_availability() -> None:
llm = google.LLM()
async AgentSession(llm=llm) as sess:
await sess.start(MyAgent())
result = await sess.run(
user_input="Hello, I need to place an order."
)
result.expect.skip_next_event_if(type="message", role="assistant")
result.expect.next_event().is_function_call(name="start_order")
result.expect.next_event().is_function_call_output()
await (
result.expect.next_event()
.is_message(role="assistant")
.judge(llm, intent="assistant should be asking the user what they would like")
)|
A starter agent optimized for voice conversations. |
Responds to multiple users in the room via push-to-talk. |
|
Background ambient and thinking audio to improve realism. |
Creating function tools dynamically. |
|
Agent that makes outbound phone calls |
Using structured output from LLM to guide TTS tone. |
|
Use tools from MCP servers |
Skip voice altogether and use the same code for text-only integrations |
|
Produce transcriptions from all users in the room |
Add an AI avatar with Tavus, Hedra, Bithuman, LemonSlice, and more |
|
Full example of an agent that handles calls for a restaurant. |
Full example (including iOS app) of Gemini Live agent that can see. |
python myagent.py consoleRuns your agent in terminal mode, enabling local audio input and output for testing. This mode doesn't require external servers or dependencies and is useful for quickly validating behavior.
python myagent.py devStarts the agent server and enables hot reloading when files change. This mode allows each process to host multiple concurrent agents efficiently.
The agent connects to LiveKit Cloud or your self-hosted server. Set the following environment variables:
- LIVEKIT_URL
- LIVEKIT_API_KEY
- LIVEKIT_API_SECRET
You can connect using any LiveKit client SDK or telephony integration. To get started quickly, try the Agents Playground.
python myagent.py startRuns the agent with production-ready optimizations.
The Agents framework is under active development in a rapidly evolving field. We welcome and appreciate contributions of any kind, be it feedback, bugfixes, features, new plugins and tools, or better documentation. You can file issues under this repo, open a PR, or chat with us in LiveKit's Slack community.
| LiveKit Ecosystem | |
|---|---|
| LiveKit SDKs | Browser · iOS/macOS/visionOS · Android · Flutter · React Native · Rust · Node.js · Python · Unity · Unity (WebGL) · ESP32 |
| Server APIs | Node.js · Golang · Ruby · Java/Kotlin · Python · Rust · PHP (community) · .NET (community) |
| UI Components | React · Android Compose · SwiftUI · Flutter |
| Agents Frameworks | Python · Node.js · Playground |
| Services | LiveKit server · Egress · Ingress · SIP |
| Resources | Docs · Example apps · Cloud · Self-hosting · CLI |
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for agents
Similar Open Source Tools
agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.
DevoxxGenieIDEAPlugin
Devoxx Genie is a Java-based IntelliJ IDEA plugin that integrates with local and cloud-based LLM providers to aid in reviewing, testing, and explaining project code. It supports features like code highlighting, chat conversations, and adding files/code snippets to context. Users can modify REST endpoints and LLM parameters in settings, including support for cloud-based LLMs. The plugin requires IntelliJ version 2023.3.4 and JDK 17. Building and publishing the plugin is done using Gradle tasks. Users can select an LLM provider, choose code, and use commands like review, explain, or generate unit tests for code analysis.
nous
Nous is an open-source TypeScript platform for autonomous AI agents and LLM based workflows. It aims to automate processes, support requests, review code, assist with refactorings, and more. The platform supports various integrations, multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It offers advanced features like reasoning/planning, memory and function call history, hierarchical task decomposition, and control-loop function calling options. Nous is designed to be a flexible platform for the TypeScript community to expand and support different use cases and integrations.
arbigent
Arbigent (Arbiter-Agent) is an AI agent testing framework designed to make AI agent testing practical for modern applications. It addresses challenges faced by traditional UI testing frameworks and AI agents by breaking down complex tasks into smaller, dependent scenarios. The framework is customizable for various AI providers, operating systems, and form factors, empowering users with extensive customization capabilities. Arbigent offers an intuitive UI for scenario creation and a powerful code interface for seamless test execution. It supports multiple form factors, optimizes UI for AI interaction, and is cost-effective by utilizing models like GPT-4o mini. With a flexible code interface and open-source nature, Arbigent aims to revolutionize AI agent testing in modern applications.
agent-framework
Microsoft Agent Framework is a comprehensive multi-language framework for building, orchestrating, and deploying AI agents with support for both .NET and Python implementations. It provides everything from simple chat agents to complex multi-agent workflows with graph-based orchestration. The framework offers features like graph-based workflows, AF Labs for experimental packages, DevUI for interactive developer UI, support for Python and C#/.NET, observability with OpenTelemetry integration, multiple agent provider support, and flexible middleware system. Users can find documentation, tutorials, and user guides to get started with building agents and workflows. The framework also supports various LLM providers and offers contributor resources like a contributing guide, Python development guide, design documents, and architectural decision records.
TaskingAI
TaskingAI brings Firebase's simplicity to **AI-native app development**. The platform enables the creation of GPTs-like multi-tenant applications using a wide range of LLMs from various providers. It features distinct, modular functions such as Inference, Retrieval, Assistant, and Tool, seamlessly integrated to enhance the development process. TaskingAI’s cohesive design ensures an efficient, intelligent, and user-friendly experience in AI application development.
actionbook
Actionbook is a browser action engine designed for AI agents, providing up-to-date action manuals and DOM structure to enable instant website operations without guesswork. It offers faster execution, token savings, resilient automation, and universal compatibility, making it ideal for building reliable browser agents. Actionbook integrates seamlessly with AI coding assistants and offers three integration methods: CLI, MCP Server, and JavaScript SDK. The tool is well-documented and actively developed in a monorepo setup using pnpm workspaces and Turborepo.
vision-agent
AskUI Vision Agent is a powerful automation framework that enables you and AI agents to control your desktop, mobile, and HMI devices and automate tasks. It supports multiple AI models, multi-platform compatibility, and enterprise-ready features. The tool provides support for Windows, Linux, MacOS, Android, and iOS device automation, single-step UI automation commands, in-background automation on Windows machines, flexible model use, and secure deployment of agents in enterprise environments.
core
CORE is an open-source unified, persistent memory layer for all AI tools, allowing developers to maintain context across different tools like Cursor, ChatGPT, and Claude. It aims to solve the issue of context switching and information loss between sessions by creating a knowledge graph that remembers conversations, decisions, and insights. With features like unified memory, temporal knowledge graph, browser extension, chat with memory, auto-sync from apps, and MCP integration hub, CORE provides a seamless experience for managing and recalling context. The tool's ingestion pipeline captures evolving context through normalization, extraction, resolution, and graph integration, resulting in a dynamic memory that grows and changes with the user. When recalling from memory, CORE utilizes search, re-ranking, filtering, and output to provide relevant and contextual answers. Security measures include data encryption, authentication, access control, and vulnerability reporting.
plano
Plano is an AI-native proxy server and data plane for agentic apps that simplifies agent routing, orchestration, signals and traces, guardrail filters, and smart LLM routing APIs. It centralizes core delivery concerns into a unified dataplane, allowing developers to focus on the core product logic of agentic applications. Plano supports any language or AI framework, enabling faster delivery of agents to production. It is built on industry-leading LLM research and Envoy by core contributors.
swiftide
Swiftide is a fast, streaming indexing and query library tailored for Retrieval Augmented Generation (RAG) in AI applications. It is built in Rust, utilizing parallel, asynchronous streams for blazingly fast performance. With Swiftide, users can easily build AI applications from idea to production in just a few lines of code. The tool addresses frustrations around performance, stability, and ease of use encountered while working with Python-based tooling. It offers features like fast streaming indexing pipeline, experimental query pipeline, integrations with various platforms, loaders, transformers, chunkers, embedders, and more. Swiftide aims to provide a platform for data indexing and querying to advance the development of automated Large Language Model (LLM) applications.
BentoML
BentoML is an open-source model serving library for building performant and scalable AI applications with Python. It comes with everything you need for serving optimization, model packaging, and production deployment.
fastagency
FastAgency is an open-source framework designed to accelerate the transition from prototype to production for multi-agent AI workflows. It provides a unified programming interface for deploying agentic workflows written in AG2 agentic framework in both development and productional settings. With features like seamless external API integration, a Tester Class for continuous integration, and a Command-Line Interface (CLI) for orchestration, FastAgency streamlines the deployment process, saving time and effort while maintaining flexibility and performance. Whether orchestrating complex AI agents or integrating external APIs, FastAgency helps users quickly transition from concept to production, reducing development cycles and optimizing multi-agent systems.
fastagency
FastAgency is a powerful tool that leverages the AutoGen framework to quickly build applications with multi-agent workflows. It supports various interfaces like ConsoleUI and MesopUI, allowing users to create interactive applications. The tool enables defining workflows between agents, such as students and teachers, and summarizing conversations. FastAgency aims to expand its capabilities by integrating with additional agentic frameworks like CrewAI, providing more options for workflow definition and AI tool integration.
DemoGPT
DemoGPT is an all-in-one agent library that provides tools, prompts, frameworks, and LLM models for streamlined agent development. It leverages GPT-3.5-turbo to generate LangChain code, creating interactive Streamlit applications. The tool is designed for creating intelligent, interactive, and inclusive solutions in LLM-based application development. It offers model flexibility, iterative development, and a commitment to user engagement. Future enhancements include integrating Gorilla for autonomous API usage and adding a publicly available database for refining the generation process.
zylos-core
Zylos is an AI framework that gives life to Claude Code, providing memory persistence, scheduling capabilities, communication through various channels, self-maintenance, and evolution. It ensures Claude Code can program, evolve, and integrate new services, growing alongside the user. Zylos offers features like unified consciousness across channels, guaranteed context preservation, self-healing mechanisms, cost-effective pricing, and integration with Claude Code for automatic updates and new capabilities.
For similar tasks
agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.
awesome-generative-ai
A curated list of Generative AI projects, tools, artworks, and models
TeroSubtitler
Tero Subtitler is an open source, cross-platform, and free subtitle editing software with a user-friendly interface. It offers fully fledged editing with SMPTE and MEDIA modes, support for various subtitle formats, multi-level undo/redo, search and replace, auto-backup, source and transcription modes, translation memory, audiovisual preview, timeline with waveform visualizer, manipulation tools, formatting options, quality control features, translation and transcription capabilities, validation tools, automation for correcting errors, and more. It also includes features like exporting subtitles to MP3, importing/exporting Blu-ray SUP format, generating blank video, generating video with hardcoded subtitles, video dubbing, and more. The tool utilizes powerful multimedia playback engines like mpv, advanced audio/video manipulation tools like FFmpeg, tools for automatic transcription like whisper.cpp/Faster-Whisper, auto-translation API like Google Translate, and ElevenLabs TTS for video dubbing.
novel2video
Novel2Video is a tool designed to batch convert novel content into images and audio, ultimately generating novel tweets. It uses llama-3.1-405b for extracting novel scenes, compatible with openaiapi. It supports Stable Diffusion web UI and ComfyUI, character locking for consistency, batch image output, single image redraw, and EdgeTTS for text-to-speech conversion.
Awesome-World-Models
This repository is a curated list of papers related to World Models for General Video Generation, Embodied AI, and Autonomous Driving. It includes foundation papers, blog posts, technical reports, surveys, benchmarks, and specific world models for different applications. The repository serves as a valuable resource for researchers and practitioners interested in world models and their applications in robotics and AI.
aigc-platform-server
This project aims to integrate mainstream open-source large models to achieve the coordination and cooperation between different types of large models, providing comprehensive and flexible AI content generation services.
Applio
Applio is a VITS-based Voice Conversion tool focused on simplicity, quality, and performance. It features a user-friendly interface, cross-platform compatibility, and a range of customization options. Applio is suitable for various tasks such as voice cloning, voice conversion, and audio editing. Its key features include a modular codebase, hop length implementation, translations in over 30 languages, optimized requirements, streamlined installation, hybrid F0 estimation, easy-to-use UI, optimized code and dependencies, plugin system, overtraining detector, model search, enhancements in pretrained models, voice blender, accessibility improvements, new F0 extraction methods, output format selection, hashing system, model download system, TTS enhancements, split audio, Discord presence, Flask integration, and support tab.
Demucs-Gui
Demucs GUI is a graphical user interface for the music separation project Demucs. It aims to allow users without coding experience to easily separate tracks. The tool provides a user-friendly interface for running the Demucs project, which originally used the scientific library torch. The GUI simplifies the process of separating tracks and provides support for different platforms such as Windows, macOS, and Linux. Users can donate to support the development of new models for the project, and the tool has specific system requirements including minimum system versions and hardware specifications.
For similar jobs
agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.
Applio
Applio is a VITS-based Voice Conversion tool focused on simplicity, quality, and performance. It features a user-friendly interface, cross-platform compatibility, and a range of customization options. Applio is suitable for various tasks such as voice cloning, voice conversion, and audio editing. Its key features include a modular codebase, hop length implementation, translations in over 30 languages, optimized requirements, streamlined installation, hybrid F0 estimation, easy-to-use UI, optimized code and dependencies, plugin system, overtraining detector, model search, enhancements in pretrained models, voice blender, accessibility improvements, new F0 extraction methods, output format selection, hashing system, model download system, TTS enhancements, split audio, Discord presence, Flask integration, and support tab.
