LMCache is an open-source, high-performance KV (key-value) cache layer designed to accelerate LLM serving and inference, especially for long-context scenarios. By storing and reusing KV caches across GPU, CPU DRAM and local disk, and enabling cross-instance sharing, LMCache reduces time-to-first-token (TTFT) and GPU usage. It integrates tightly with vLLM, supports P2P cache sharing, non-prefix caches, multiple storage backends (CPU, disk, NIXL), and is distributed under Apache-2.0.