NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.
NVIDIA Dynamo is an open-source, high-throughput, low-latency inference framework that scales generative-AI and reasoning models across large, multi-node GPU clusters.