Overview
LiveKit Agents is a server-side framework for creating realtime, programmable voice and multimodal AI agents. It’s designed to let developers combine speech-to-text (STT), large language models (LLMs), text-to-speech (TTS), and LiveKit’s realtime WebRTC APIs to build conversational participants that can hear, speak, and (optionally) see. The project focuses on production-readiness: job scheduling and dispatch, telephony integration, built-in testing, and an extensible plugin system for model and provider integrations.
Key features
- Flexible integrations: mix-and-match STT, LLM, and TTS providers via plugins so agents can use different vendors or local models.
- Realtime voice support: built on LiveKit’s WebRTC ecosystem, allowing agents to act as realtime participants in rooms and calls.
- Telephony & SIP: works with LiveKit’s telephony stack to make and receive phone calls (outbound/inbound callers).
- Job scheduling & dispatch APIs: built-in worker model and dispatch system to route users to agents and scale processing.
- Semantic turn detection: uses models to detect user turn boundaries (reduce interruptions during voice conversations).
- MCP support: native support for MCP tool integrations to access external tools/services from agents.
- Testing & judges: testing primitives to write reproducible tests and automated judges for non-deterministic LLM behaviour.
- Examples & samples: many runnable examples (starter agent, multi-agent handoff, background audio, multi-user push-to-talk, restaurant ordering, vision demos).
- Open-source & self-hostable: you can run the whole stack (agents + LiveKit server) on your own infrastructure.
Installation & quick start
The Python package is published as livekit-agents. A typical install command includes common provider plugins, for example:
pip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]~=1.0"
A minimal flow:
- Write an Agent class (instructions, tools, optional lifecycle hooks).
- Create an AgentSession that configures vad, stt, llm, tts, and user data.
- Start the session in a worker entrypoint and connect the session to a LiveKit room or telephony call.
Typical use cases
- Voice assistants for customer support and call centers (including outbound callers).
- Real-time multimodal agents that combine speech, audio, and vision (e.g., Gemini Live vision example).
- In-room interactive characters and avatar agents with TTS and background audio.
- Pipelines that require scheduling and dispatch of agent jobs across workers.
Extensibility & ecosystem
LiveKit Agents exposes plugin points for STT, TTS, turn detection, LLM backends, and data APIs. It integrates with LiveKit’s broader SDK ecosystem (browser, iOS, Android, Flutter, server SDKs) and example repos. The README includes many example scripts and links to a JS/TS sibling library (AgentsJS) and an Agents Playground for quick experimentation.
Testing & production
The framework emphasises testing for LLM-driven workflows, providing utilities to assert function calls, judge generated messages, and skip/expect events. For production, the project offers a worker CLI for running agents with hot reloading during development and environment variables to connect to LiveKit Cloud or a self-hosted LiveKit server.
Community & docs
Comprehensive documentation is available on the LiveKit docs site (official docs). The repo links to community resources such as Slack and examples. The project is MIT-licensed (refer to the repo for license text).
