Overview
LitServe is a flexible, FastAPI-based serving engine from Lightning AI. Designed for cloud or self-hosted environments, it spins models up from zero to thousands of GPUs automatically, handling batching, streaming and multi-model pipelines without MLOps overhead.
Key Capabilities
- Zero-to-many autoscaling—idle servers scale to zero, bursts scale across GPUs
- 2× Faster than vanilla FastAPI with built-in async batching
- Streaming & WebSocket/SSE responses for LLM chat, audio, vision
- Multi-model, multi-GPU orchestration and custom Python hooks
- One-click deploy to Lightning Cloud or run locally with Docker