AIAny - AI Deploy

Xinference

2023

Xprobe Inc.

Xorbits’ universal inference layer (library name `xinference`) that deploys and serves LLMs and multimodal models from laptop to cluster.

ai-development ai-inference ai-serving

TensorRT-LLM

2023

NVIDIA

NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.

ai-development ai-inference ai-serving nvidia

FlashInfer

2023

FlashInfer Team

CUDA kernel library that brings Flash-attention-style optimizations to any LLM serving stack.

ai-development ai-inference ai-serving

LitServe

2024

Lightning AI

Lightning-fast engine that lets you serve any AI model—LLMs, vision, audio—at scale with zero YAML and automatic GPU autoscaling.

ai-development ai-inference ai-serving

KTransformers

2024

KVCache-AI Team

Pythonic framework to inject experimental KV-cache optimizations into HuggingFace Transformers stacks.

ai-development ai-inference ai-serving

Mooncake

2024

KVCache-AI Team

Distributed KV-cache store & transfer engine that decouples prefilling from decoding to scale vLLM serving clusters.

ai-development ai-inference ai-serving

AIBrix

2025

vLLM Project

vLLM-project’s control-plane that orchestrates cost-efficient, plug-and-play LLM inference infrastructure.

ai-development ai-inference ai-serving

NVIDIA Dynamo

2025

NVIDIA

NVIDIA Dynamo is an open-source, high-throughput, low-latency inference framework that scales generative-AI and reasoning models across large, multi-node GPU clusters.

ai-development ai-inference ai-serving nvidia

Category

Explore by categories

All

AI Leaderboard

AI Agent Tutorials

AI Coding Tutorials

AI Agent Papers

Chatbot

Machine Learning Foundation Books

AI Train

AI Deploy

AI Client

Machine Learning Foundation Papers

Machine Learning Foundation Tutorials

AI Image Demos

AI Agent

Large Language Model Tutorials

Large Language Model Papers

Machine Learning Engineering Papers

Computer Vision Tutorials

Computer Vision Papers

Natural Language Processing Papers

Reinforcement Learning Papers

Speech Technology Papers

AI API

AI Coding

AI Image

AI Video

MLOps

MCP Client

MCP Server

AI Video Papers

AI Audio

AI Infra

Xinference

TensorRT-LLM

FlashInfer

LitServe

KTransformers

Mooncake

AIBrix

NVIDIA Dynamo