vllm-omni

vLLM-Omni is an open-source framework from the vLLM community for efficient inference and serving of omni-modality models. It extends vLLM's fast autoregressive support to handle multi-modal data (text, image, video, audio), non-autoregressive architectures, heterogeneous outputs, and integrates with Hugging Face models while offering pipeline parallelism, KV-cache optimizations, and an OpenAI-compatible API.

Visit Website

Introduction

Overview

vLLM-Omni is an extension of the vLLM ecosystem focused on serving and inference for omni-modality (multi-modal) models. While vLLM originally targeted large autoregressive text models, vLLM-Omni broadens that capability to support images, video, and audio, plus non-autoregressive generation architectures such as diffusion transformers.

Key features

Omni-modality support: processes text, images, video and audio within the same serving framework.
Multi-architecture support: handles autoregressive and non-autoregressive models (e.g., DiT, diffusion-like models) and heterogeneous outputs (text, images, multimodal responses).
Performance optimizations: inherits vLLM's efficient KV cache management for AR models, implements pipelined stage execution to increase throughput, and supports disaggregated execution with dynamic resource allocation.
Flexible pipeline abstraction: provides heterogeneous pipeline primitives to compose complex model workflows and to integrate multiple stages (pre/post-processing, model stages, decoders).
Integration with Hugging Face: seamless support for many open-source models available on Hugging Face, including omni models such as Qwen-Omni and Qwen-Image.
Scalability & parallelism: supports tensor/pipeline/data/expert parallelism for distributed inference.
Developer ergonomics & APIs: streaming outputs, OpenAI-compatible API server, and documentation/quickstart guides for easy adoption.
Open license: distributed under the Apache License 2.0.

Typical uses

Deploying multi-modal foundation models for production inference (e.g., vision+language assistants).
Serving diffusion/parallel-generation models with high throughput and lower latency.
Building pipelines that combine multiple model types or modalities and need coordinated resource allocation.

Who it's for

MLOps and infra engineers who need a performant, production-ready inference stack for multi-modal models.
Researchers and developers who want to prototype or serve multi-modal models with compatibility to Hugging Face and OpenAI-style APIs.

Quick links & ecosystem

Documentation / Quickstart: the project provides hosted docs and guides for installation, supported models, and contribution.
Community: a vLLM user forum and developer Slack are available for support and discussion.

License

vLLM-Omni is released under the Apache License 2.0.

Back

Information

Websitegithub.com
Authorsvllm-project
Published date2025/09/11

More Items

Genesis

2023

Genesis Authors

Genesis is an open-source physics and simulation platform for general-purpose robotics and embodied AI. It integrates multiple physics solvers, photo-realistic ray-tracing rendering, and a generative data engine. Designed for extreme speed, cross-platform use, and differentiable simulation, Genesis targets robotics research, automated dataset generation, and simulation-driven AI development.

physics ai-framework ai-agent ai-image

MemU

2025

NevaMind-AI

MemU is an agentic memory infrastructure for LLMs and AI agents. It ingests multimodal inputs (conversations, documents, images, audio, video), extracts structured memory, and organizes it into a three-layer hierarchical file system (Resource → Item → Category). MemU supports both embedding-based (RAG) and LLM-based retrieval, offers cloud APIs and a self-hosted Python library (requires Python 3.13+), and targets use cases like personal assistants, agent memory, knowledge management, and agent self-improvement.

LLM RAG ai-agent ai-library github+1

ms-swift (SWIFT: Scalable lightWeight Infrastructure for Fine-Tuning)

2023

ModelScope community, Yuze Zhao +11

ms-swift (SWIFT) is an extensible, lightweight infrastructure from the ModelScope community for fine-tuning, evaluating, quantizing and deploying large language models (LLMs) and multimodal LLMs. It supports hundreds of text and multimodal models, many low-cost fine-tuning and quantized training techniques, Megatron-style model parallelism, RL/GRPO family algorithms for alignment, and multiple inference/deployment backends such as vLLM and LMDeploy. ms-swift provides CLI, Python APIs and a Web UI for end-to-end model workflows.

llm ai-train ai-inference ai-serving github+3

vllm-omni

Introduction

Overview

Key features

Typical uses

Who it's for

Quick links & ecosystem

License

Information

Categories

Tags

More Items

Genesis

MemU

ms-swift (SWIFT: Scalable lightWeight Infrastructure for Fine-Tuning)