Xorbits’ universal inference layer (library name `xinference`) that deploys and serves LLMs and multimodal models from laptop to cluster.
LiteLLM is an open-source LLM gateway and Python SDK that lets developers call more than 100 commercial and open-source models through a single OpenAI-compatible interface, complete with cost tracking, rate-limiting, load-balancing and guardrails.
NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.
CUDA kernel library that brings Flash-attention-style optimizations to any LLM serving stack.
High-performance Python framework and platform for orchestrating collaborative agent “crews”.
FastGPT is an open-source AI knowledge-base platform that combines RAG retrieval, visual workflows and multi-model support to build domain-specific chatbots quickly.
Zero-code CLI & WebUI to fine-tune 100+ LLMs/VLMs with LoRA, QLoRA, PPO, DPO and more.
Microsoft Research approach that enriches RAG with knowledge-graph structure and community summaries.
An open-source, Ray-based framework for scalable Reinforcement Learning from Human Feedback (RLHF).
An integration & tooling platform that equips AI agents and LLM apps with 300-plus pre-built, authenticated tools and event triggers.
Lightning-fast engine that lets you serve any AI model—LLMs, vision, audio—at scale with zero YAML and automatic GPU autoscaling.
LangGraph is LangChain’s open-source orchestration framework that lets you compose long-running, stateful LLM agents as graphs of nodes and edges.