Overview
KTransformers swaps standard attention blocks for pluggable high-throughput kernels while exposing the familiar HF API.
Key Capabilities
- Drop-in replacement via one import
- REST server & ChatGPT-style web UI
- Supports paged-attention & FlashInfer