Overview
Xinference makes it one-command to host heterogeneous generative models behind REST/RPC endpoints compatible with OpenAI and LangChain.
Key Capabilities
- One-click local or distributed launch
- Built-in model registry with automatic download
- Web UI & CLI with live monitoring
- CPU/GPU adaptive scheduling and elastic scaling