NVIDIA TensorRT is an SDK and tool-suite that compiles and optimizes trained neural-network models for ultra-fast, low-latency inference on NVIDIA GPUs.
A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.
An open-source, production-ready system for serving machine-learning models at scale.
Open-source high-performance framework and DSL for serving large language & vision-language models with low-latency, controllable, structured generation.