AI operating system that turns any LLM into a stateful agent with long-term memory.
Pythonic framework to inject experimental KV-cache optimizations into HuggingFace Transformers stacks.
Screen-parsing module from Microsoft Research that converts UI screenshots into structured elements for vision-based GUI agents.
Open-source AI-powered browser-automation framework that exposes a website’s interactive elements in a simple, text-like format so LLM agents can read pages and complete multi-step tasks automatically.
Distributed KV-cache store & transfer engine that decouples prefilling from decoding to scale vLLM serving clusters.
Volcano Engine Reinforcement Learning library for efficient LLM post-training—open-sourced HybridFlow.
RAGFlow is InfiniFlow’s open-source Retrieval-Augmented Generation engine focused on deep-document understanding and scalable multi-format ingestion.
vLLM-project’s control-plane that orchestrates cost-efficient, plug-and-play LLM inference infrastructure.
NVIDIA Dynamo is an open-source, high-throughput, low-latency inference framework that scales generative-AI and reasoning models across large, multi-node GPU clusters.