Overview
Mooncake moves KV tensors across GPUs or nodes so that multiple inference servers can share prefilling work and latency.
Key Capabilities
- Trace-based prefill disaggregation
- P2P store & vLLM integration
- Transfer-engine plug-in architecture