TensorRT-LLM

NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.

Visit Website

Introduction

Overview

TensorRT-LLM accelerates large-language-model inference by generating TensorRT engines with custom attention kernels, paged-KV caching, quantization (FP8/FP4/INT4/INT8) and speculative decoding.

Key Capabilities

Automatic engine generation from PyTorch checkpoints
In-flight batching and look-ahead decoding for high throughput
Multi-GPU / multi-node readiness via Triton back-end
Python & C++ runtimes with OpenAI-style API

Back

Information

Websitedocs.nvidia.com
AuthorsNVIDIA
Published date2023/10/19

More Items

OM1 (OpenMind)

2025

OpenMind

OM1 is a modular AI runtime by OpenMind for building and deploying multimodal AI agents across digital environments and physical robots. Written in Python, OM1 ingests diverse sensor and web inputs, integrates multiple LLMs and VLMs, provides TTS/ASR endpoints, and connects to robot hardware via plugins (ROS2, Zenoh, CycloneDDS). It includes a web-based visual debugger (WebSim), example agents, documentation, and a technical paper, enabling developers to create configurable, upgradeable robot agents for humanoids, quadrupeds, mobile apps, and educational robots.

ai-agent ai-development mlops ai-framework ai-inference+2

Amazon Bedrock AgentCore Samples

2025

AWS Labs, Amazon Web Services

Amazon Bedrock AgentCore Samples is an AWS-provided repository with examples and tutorials to help developers deploy, operate, and scale AI agents using Amazon Bedrock AgentCore. It includes tutorial notebooks, integration examples for popular agent frameworks, deployment templates (CloudFormation/CDK/Terraform), built-in tools (code interpreter, browser tool), and observability guidance to move agents from prototype to production.

amazon ai-agent ai-development mcp-server mcp-client+5

Pathway

2022

pathwaycom (Pathway team)

Pathway is a Python ETL and live data framework combining a user-friendly Python API with a high-performance Rust engine. It supports both batch and streaming pipelines, stateful transformations, persistence, numerous connectors, and includes LLM/RAG tooling for real-time analytics and live LLM pipelines.

github mlops ai-framework LLM RAG+4