Overview
MLC-LLM turns any HuggingFace checkpoint into a highly-optimized library with int4/int8 kernels, delivering OpenAI-style APIs on desktop, mobile and browser.
Key Capabilities
- Ahead-of-time compilation via TVM Unity
- Unified engine with Metal, Vulkan, CUDA back-ends
- REST, Python, JS, iOS & Android SDKs
- WebLLM for client-side web inference