Project AIRI — Overview
Project AIRI is an open-source initiative to build "cyber living" virtual characters (AI waifu / digital companions / VTubers) that can chat, act, and play games autonomously. Heavily inspired by Neuro-sama, AIRI focuses on providing a self-hosted, extensible stack that integrates modern LLMs, real-time audio, model orchestration, memory/RAG, and multi-platform frontends.
Key features
- Brain: multi-provider LLM integrations (OpenAI, Anthropic, vLLM, Google Gemini, Ollama, Mistral, Groq, and many others via the xsAI adapter), prompt engineering and agent orchestration.
- Game agents: out-of-the-box capabilities for playing Minecraft and Factorio (includes demos and separate subprojects for deeper integration).
- Real-time audio: browser and Discord audio input, client-side speech recognition, VAD/talking detection, and TTS support (e.g., ElevenLabs).
- Avatar & presentation: supports VRM and Live2D models with animations (auto-blink, auto look-at, idle eye movement) and Web-based rendering using WebGPU / WebAssembly where applicable.
- Memory & retrieval: in-browser DB support (DuckDB WASM/pglite) and RAG-style systems; memory subsystem is in active development.
- Cross-platform: Stage Web (browser), desktop (Tauri / native with CUDA/Metal support), and mobile (PWA / Capacitor) targets.
- Extensibility: plugin and subproject architecture (xsai, unspeech, duckdb-wasm, tauri plugins, etc.), encouraging community contributions (artists, modellers, engineers).
Architecture highlights
- Uses modern Web tech for UI and some runtime (WebGPU, WebAudio, WebWorkers, WebAssembly) while offering native paths that leverage CUDA/Apple Metal for heavier model inference via underlying runtimes (e.g., candle, vLLM).
- xsAI acts as the adapter layer to multiple LLM providers and runtimes, enabling flexible provider switching and hybrid local/cloud setups.
- Modular subprojects (e.g., unspeech, hfup, inventory, mcp-launcher) let AIRI handle audio pipelines, model bundling/deployment, centralized model catalogs, and MCP server tooling.
Developer & deployment notes
- Repo includes development scripts and stages:
- pnpm i
- pnpm dev (Stage Web)
- pnpm dev:tamagotchi (desktop Tamagotchi stage)
- pnpm dev:capacitor (mobile/capacitor)
- Documentation and developer guides are provided on the official docs site; contributors are invited for roles ranging from UI/artist to RL and inference engineering.
Supported providers & integrations
AIRI is designed to interoperate with many LLM and service providers (OpenAI, Anthropic, vLLM, Google Gemini, Ollama, Mistral, Groq, xAI, ElevenLabs for TTS, Discord/Telegram integrations for chat, and others) and exposes connectors for RAG and embedding stores.
Use cases
- Personal/self-hosted VTuber or digital companion that streams, chats, and interacts with viewers.
- Research and prototyping platform for multi-modal agents (speech + vision + actions in games).
- Education/demo platform to show agent orchestration, RAG/memory, and multi-provider LLM usage.
Community & ecosystem
- Active GitHub organization and many subprojects (unspeech, airi-factorio, xsai-transformers, duckdb-wasm, etc.).
- Public documentation site, Discord server, social media presence, and devlogs tracking progress and releases.
Status & roadmap notes
- Project is under active development with many working components (game playing, audio IO, avatar control) while some subsystems (memory/alaya, pure in-browser full-model inference) remain works-in-progress.
Quick links
- Repository: GitHub (this URL)
- Docs / Try it: AIRI official site and docs
(End of introduction.)
