LogoAIAny
Icon for item

KTransformers

Pythonic framework to inject experimental KV-cache optimizations into HuggingFace Transformers stacks.

Introduction

Overview

KTransformers swaps standard attention blocks for pluggable high-throughput kernels while exposing the familiar HF API.

Key Capabilities
  • Drop-in replacement via one import
  • REST server & ChatGPT-style web UI
  • Supports paged-attention & FlashInfer

Information

Categories