LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2025 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • chatbot

  • chemistry

  • claude

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • gemini

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • openai

  • paper

  • physics

  • plugin

  • RL

  • science

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

Icon for item

GPT4All

2023
Nomic AI

Local-first LLM ecosystem from Nomic AI that runs quantized chat models on everyday CPUs and GPUs with a desktop app, Python bindings and REST API.

ai-developmentai-inferenceai-serving
Icon for item

MLC-LLM

2023
MLC AI Lab

Universal LLM deployment engine that compiles models with TVM Unity for native execution across GPUs, CPUs, mobile and WebGPU.

ai-developmentai-inferenceai-serving
Icon for item

LMDeploy

2023
InternLM Team

Toolkit from InternLM for compressing, quantizing and serving LLMs with INT4/INT8 kernels on GPUs.

ai-developmentai-inferenceai-serving
Icon for item

Xinference

2023
Xprobe Inc.

Xorbits’ universal inference layer (library name `xinference`) that deploys and serves LLMs and multimodal models from laptop to cluster.

ai-developmentai-inferenceai-serving
Icon for item

TensorRT-LLM

2023
NVIDIA

NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.

ai-developmentai-inferenceai-servingnvidia
Icon for item

FlashInfer

2023
FlashInfer Team

CUDA kernel library that brings Flash-attention-style optimizations to any LLM serving stack.

ai-developmentai-inferenceai-serving
Icon for item

LitServe

2024
Lightning AI

Lightning-fast engine that lets you serve any AI model—LLMs, vision, audio—at scale with zero YAML and automatic GPU autoscaling.

ai-developmentai-inferenceai-serving
Icon for item

KTransformers

2024
KVCache-AI Team

Pythonic framework to inject experimental KV-cache optimizations into HuggingFace Transformers stacks.

ai-developmentai-inferenceai-serving
Icon for item

Mooncake

2024
KVCache-AI Team

Distributed KV-cache store & transfer engine that decouples prefilling from decoding to scale vLLM serving clusters.

ai-developmentai-inferenceai-serving
Icon for item

AIBrix

2025
vLLM Project

vLLM-project’s control-plane that orchestrates cost-efficient, plug-and-play LLM inference infrastructure.

ai-developmentai-inferenceai-serving
Icon for item

NVIDIA Dynamo

2025
NVIDIA

NVIDIA Dynamo is an open-source, high-throughput, low-latency inference framework that scales generative-AI and reasoning models across large, multi-node GPU clusters.

ai-developmentai-inferenceai-servingnvidia
  • Previous
  • 1
  • 2
  • Next