AIAny - ai-serving

ONNX Runtime

2018

Microsoft

Microsoft’s high-performance, cross-platform inference engine for ONNX and GenAI models.

ai-development ai-inference ai-serving

BentoML

2019

BentoML Team

Open-source framework for building, shipping and running containerized AI services with a single command.

ai-development ai-inference ai-serving

Text-Generation-Inference

2022

Hugging Face

Hugging Face’s Rust + Python server for high-throughput, multi-GPU text generation.

ai-development ai-inference ai-serving

llama.cpp

2023

ggml-org

GGML-based C/C++ implementation that runs LLaMA-family models locally with no dependencies.

ai-development ai-inference ai-serving

GPT4All

2023

Nomic AI

Local-first LLM ecosystem from Nomic AI that runs quantized chat models on everyday CPUs and GPUs with a desktop app, Python bindings and REST API.

ai-development ai-inference ai-serving

MLC-LLM

2023

MLC AI Lab

Universal LLM deployment engine that compiles models with TVM Unity for native execution across GPUs, CPUs, mobile and WebGPU.

ai-development ai-inference ai-serving

LMDeploy

2023

InternLM Team

Toolkit from InternLM for compressing, quantizing and serving LLMs with INT4/INT8 kernels on GPUs.

ai-development ai-inference ai-serving

Xinference

2023

Xprobe Inc.

Xorbits’ universal inference layer (library name `xinference`) that deploys and serves LLMs and multimodal models from laptop to cluster.

ai-development ai-inference ai-serving

TensorRT-LLM

2023

NVIDIA

NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.

ai-development ai-inference ai-serving nvidia

FlashInfer

2023

FlashInfer Team

CUDA kernel library that brings Flash-attention-style optimizations to any LLM serving stack.

ai-development ai-inference ai-serving

LitServe

2024

Lightning AI

Lightning-fast engine that lets you serve any AI model—LLMs, vision, audio—at scale with zero YAML and automatic GPU autoscaling.

ai-development ai-inference ai-serving

KTransformers

2024

KVCache-AI Team

Pythonic framework to inject experimental KV-cache optimizations into HuggingFace Transformers stacks.

ai-development ai-inference ai-serving

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

ONNX Runtime

BentoML