Overview
TGI shards decoder weights across GPUs, streams tokens over SSE/GRPC and exposes an OpenAI-style /generate
route.
Key Capabilities
- Tensor / pipeline parallel & quantization
- KV-cache, speculative decoding, vLLM sampler
- Prometheus + Jaeger telemetry hooks