Overview
OM1 is a modular, Python-based AI runtime developed by OpenMind for creating and deploying multimodal agents that operate in both digital and physical environments. The platform focuses on robotics use cases (humanoids, quadrupeds, TurtleBot 4, etc.) while remaining applicable to phone apps and web-based agents. OM1 agents can consume varied inputs—camera feeds, LiDAR, web and social media data—and output high-level actions such as movement, speech, and expressive behaviors.
Key Capabilities
- Modular architecture: componentized design for inputs, actions, and policies, implemented in Python for easy extension and integration.
- Multimodal inputs: supports vision, depth/LiDAR, audio (ASR), and external data sources; designed to accept new sensors and data types through input plugins.
- Multiple model endpoints: pre-configured connectors for a range of LLMs and VLMs as well as TTS/ASR services, enabling flexible model selection and fallbacks.
- Hardware integration: plugin system and example connectors for ROS2, Zenoh, CycloneDDS, websockets, serial, and USB—Zenoh is recommended for new development.
- Web-based debugging: WebSim (http://localhost:8000/) provides visual monitoring of agent perception, decisions, and action commands for easier development and debugging.
- Example agents & workflows: includes ready-made agents (e.g., Spot) that demonstrate webcam-based perception, captioning, LLM-driven action planning, and simulated actuation.
Architecture & Extensibility
OM1 is structured around configurable JSON5 agent definitions that tie inputs -> reasoning (LLM/VLM) -> actions. Hardware adapters implement a hardware abstraction layer (HAL) that accepts elemental commands such as move(x,y,z), pick_up(), or expressive actions (smile, wave). The repo provides examples for integrating with external SDKs (e.g., Unitree SDK) and guidance for creating Dockerized stacks and middleware bridges.
Getting Started (high level)
- Clone the repository and initialize submodules.
- Use the uv package manager to create the virtual environment and install dependencies.
- Configure an OpenMind API key (portal) and update the agent config (e.g., spot.json5) for ASR/TTS and model endpoints.
- Launch an example agent (e.g.,
uv run src/run.py spot) and inspect behavior in WebSim.
Detailed installation notes are included for macOS and Linux (portaudio, ffmpeg, etc.).
Robotics & Full Autonomy
OM1 supports full autonomy workflows when combined with companion repositories (unitree-sdk, om1-avatar, om1-video-processor). The project documents a four-service loop for Unitree robots that includes perception, SLAM/navigation (via unitree_sdk), UI/avatar front-end, and video processing services—demonstrating end-to-end autonomy examples.
Use Cases
- Research & prototyping of multimodal robot agents that combine LLM reasoning with perception and actuation.
- Educational platforms for teaching robotics and embodied AI using TurtleBot-like robots and simulators.
- Production robot integrations where a modular runtime can be extended to hardware-specific SDKs and middleware.
Resources
- Repository: https://github.com/OpenMind/OM1
- Documentation: https://docs.openmind.org/
- Technical paper (referenced in repo): arXiv link in README
- Community: X (Twitter) and Discord links are provided in the README
License & Contribution
OM1 is licensed under the MIT License. Contributors are asked to follow the CONTRIBUTING.md guidance in the repo when submitting pull requests.
