agents

agents

Build real-time multimodal AI applications 🤖🎙️📹

Stars: 4573

Visit
 screenshot

The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.

README:

The LiveKit icon, the name of the repository and some sample code in the background.



Looking for the JS/TS library? Check out AgentsJS

✨ NEW ✨

Google Gemini 2.0 support

Introducing support for the new Gemini 2.0 model. Here's an example voice agent running Google STT, TTS, and Gemini 2.0 Flash: code

In-house phrase endpointing model

We’ve trained a new, open weights phrase endpointing model that significantly improves end-of-turn detection and conversational flow between voice agents and users by reducing agent interruptions. Optimized to run on CPUs, it’s available via livekit-plugins-turn-detector package.

What is Agents?

The Agents framework enables you to build AI-driven server programs that can see, hear, and speak in realtime. It offers a fully open-source platform for creating realtime, agentic applications.

Features

  • Flexible integrations: A comprehensive ecosystem to mix and match the right models for each use case.
  • AI voice agents: VoicePipelineAgent and MultimodalAgent help orchestrate the conversation flow using LLMs and other AI models.
  • Integrated job scheduling: Built-in task scheduling and distribution with dispatch APIs to connect end users to agents.
  • Realtime media transport: Stream audio, video, and data over WebRTC and SIP with client SDKs for most platforms.
  • Telephony integration: Works seamlessly with LiveKit's telephony stack, allowing your agent to make calls to or receive calls from phones.
  • Exchange data with clients: Use RPCs and other Data APIs to seamlessly exchange data with clients.
  • Open-source: Fully open-source, allowing you to run the entire stack on your own servers, including LiveKit server, one of the most widely used WebRTC media servers.

Installation

To install the core Agents library:

pip install livekit-agents

Integrations

The framework includes a variety of plugins that make it easy to process streaming input or generate output. For example, there are plugins for converting text-to-speech or running inference with popular LLMs. Here's how you can install a plugin:

pip install livekit-plugins-openai

Realtime API

We've partnered with OpenAI on a new MultimodalAgent API in the Agents framework. This class completely wraps OpenAI’s Realtime API, abstracts away the raw wire protocol, and provide an ultra-low latency WebRTC transport between GPT-4o and your users’ devices. This same stack powers Advanced Voice in the ChatGPT app.

  • Try the Realtime API in our playground [code]
  • Check out our guide to building your first app with this new API

LLM

Provider Package Usage
OpenAI livekit-plugins-openai openai.LLM()
Azure OpenAI livekit-plugins-openai openai.LLM.with_azure()
Anthropic livekit-plugins-anthropic anthropic.LLM()
Google (Gemini) livekit-plugins-openai openai.LLM.with_vertex()
Cerebras livekit-plugins-openai openai.LLM.with_cerebras()
Groq livekit-plugins-openai openai.LLM.with_groq()
Ollama livekit-plugins-openai openai.LLM.with_ollama()
Perplexity livekit-plugins-openai openai.LLM.with_perplexity()
Together.ai livekit-plugins-openai openai.LLM.with_together()
X.ai (Grok) livekit-plugins-openai openai.LLM.with_x_ai()

STT

Provider Package Streaming Usage
Azure livekit-plugins-azure azure.STT()
Deepgram livekit-plugins-deepgram deepgram.STT()
OpenAI (Whisper) livekit-plugins-openai openai.STT()
Google livekit-plugins-google google.STT()
AssemblyAI livekit-plugins-assemblyai assemblyai.STT()
Groq (Whisper) livekit-plugins-openai openai.STT.with_groq()
FAL (Whizper) livekit-plugins-fal fal.STT()

TTS

Provider Package Streaming Voice Cloning Usage
Cartesia livekit-plugins-cartesia cartesia.TTS()
ElevenLabs livekit-plugins-elevenlabs elevenlabs.TTS()
OpenAI livekit-plugins-openai openai.TTS()
Azure OpenAI livekit-plugins-openai openai.TTS.with_azure()
Google livekit-plugins-google google.TTS()
Deepgram livekit-plugins-deepgram deepgram.TTS()

Other plugins

Plugin Description
livekit-plugins-rag Annoy based simple RAG
livekit-plugins-llama-index RAG with LlamaIndex
livekit-plugins-nltk Utilities for working with text
livekit-plugins-vad Voice activity detection
livekit-plugins-turn-detector Conversational turn detection model

Documentation and guides

Documentation on the framework and how to use it can be found here

Example agents

Description Demo Link Code Link
A basic voice agent using a pipeline of STT, LLM, and TTS demo code
Voice agent using the new OpenAI Realtime API demo code
Super fast voice agent using Cerebras hosted Llama 3.1 demo code
Voice agent using Cartesia's Sonic model demo code
Agent that looks up the current weather via function call N/A code
Voice Agent using Gemini 2.0 Flash N/A code
Voice agent with custom turn-detection model N/A code
Voice agent that performs a RAG-based lookup N/A code
Video agent that publishes a stream of RGB frames N/A code
Transcription agent that generates text captions from a user's speech N/A code
A chat agent you can text who will respond back with generated speech N/A code
Localhost multi-agent conference call N/A code
Moderation agent that uses Hive to detect spam/abusive video N/A code

Contributing

The Agents framework is under active development in a rapidly evolving field. We welcome and appreciate contributions of any kind, be it feedback, bugfixes, features, new plugins and tools, or better documentation. You can file issues under this repo, open a PR, or chat with us in LiveKit's Slack community.


LiveKit Ecosystem
Realtime SDKs Browser · iOS/macOS/visionOS · Android · Flutter · React Native · Rust · Node.js · Python · Unity · Unity (WebGL)
Server APIs Node.js · Golang · Ruby · Java/Kotlin · Python · Rust · PHP (community)
UI Components React · Android Compose · SwiftUI
Agents Frameworks Python · Node.js · Playground
Services LiveKit server · Egress · Ingress · SIP
Resources Docs · Example apps · Cloud · Self-hosting · CLI

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for agents

Similar Open Source Tools

For similar tasks

For similar jobs