AIAny - ai-serving

TensorFlow Serving

2016

Google

An open-source, production-ready system for serving machine-learning models at scale.

ai-development ai-library ai-inference ai-serving google

vLLM is a high-throughput, memory-efficient inference and serving engine for large language models (LLMs), built to deliver state-of-the-art performance on GPUs with features such as PagedAttention and continuous batching.

ai-development ai-library ai-inference ai-serving

SGLang

2023

Lianmin Zheng, Liangsheng Yin +10

Open-source high-performance framework and DSL for serving large language & vision-language models with low-latency, controllable, structured generation.

ai-development ai-library ai-inference ai-serving

Ollama

2023

Jeffrey Morgan, Michael Chiang

A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.

ai-development ai-library ai-inference ai-serving LLM

TensorRT

2016

NVIDIA

NVIDIA TensorRT is an SDK and tool-suite that compiles and optimizes trained neural-network models for ultra-fast, low-latency inference on NVIDIA GPUs.

ai-development ai-library ai-inference ai-serving nvidia

Ray

2017

RISELab (UC Berkeley), Anyscale Inc.

Ray is an open-source distributed compute engine that lets you scale Python and AI workloads—from data processing to model training and serving—without deep distributed-systems expertise.

ai-development ai-framework ai-train ai-serving

KServe

2018

KServe Community

CNCF-incubating model inference platform (formerly KFServing) that provides Kubernetes CRDs for scalable predictive and generative workloads.

ai-development ai-inference ai-serving

Triton

2018

NVIDIA

Open-source, high-performance server for deploying and scaling AI/ML models on GPUs or CPUs, supporting multiple frameworks and cloud/edge targets.

ai-development ai-inference ai-serving nvidia

OpenVINO

2018

Intel

OpenVINO is an open-source toolkit from Intel that streamlines the optimization and deployment of AI inference models across a wide range of Intel® hardware.

ai-development ai-inference ai-serving

ONNX Runtime

2018

Microsoft

Microsoft’s high-performance, cross-platform inference engine for ONNX and GenAI models.

ai-development ai-inference ai-serving

BentoML

2019

BentoML Team

Open-source framework for building, shipping and running containerized AI services with a single command.

ai-development ai-inference ai-serving

Text-Generation-Inference

2022

Hugging Face

Hugging Face’s Rust + Python server for high-throughput, multi-GPU text generation.

ai-development ai-inference ai-serving

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

google

gradient-booting

grok

huggingface

LLM

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

openai

paper

physics

plugin

RL

science

translation

tutorial

vibe-coding

video

vision

xAI

xai

TensorFlow Serving

vLLM

SGLang

Ollama

TensorRT

Ray

KServe

Triton