LogoAIAny
Icon for item

OM1 (OpenMind)

OM1 is a modular AI runtime by OpenMind for building and deploying multimodal AI agents across digital environments and physical robots. Written in Python, OM1 ingests diverse sensor and web inputs, integrates multiple LLMs and VLMs, provides TTS/ASR endpoints, and connects to robot hardware via plugins (ROS2, Zenoh, CycloneDDS). It includes a web-based visual debugger (WebSim), example agents, documentation, and a technical paper, enabling developers to create configurable, upgradeable robot agents for humanoids, quadrupeds, mobile apps, and educational robots.

Introduction

Overview

OM1 is a modular, Python-based AI runtime developed by OpenMind for creating and deploying multimodal agents that operate in both digital and physical environments. The platform focuses on robotics use cases (humanoids, quadrupeds, TurtleBot 4, etc.) while remaining applicable to phone apps and web-based agents. OM1 agents can consume varied inputs—camera feeds, LiDAR, web and social media data—and output high-level actions such as movement, speech, and expressive behaviors.

Key Capabilities
  • Modular architecture: componentized design for inputs, actions, and policies, implemented in Python for easy extension and integration.
  • Multimodal inputs: supports vision, depth/LiDAR, audio (ASR), and external data sources; designed to accept new sensors and data types through input plugins.
  • Multiple model endpoints: pre-configured connectors for a range of LLMs and VLMs as well as TTS/ASR services, enabling flexible model selection and fallbacks.
  • Hardware integration: plugin system and example connectors for ROS2, Zenoh, CycloneDDS, websockets, serial, and USB—Zenoh is recommended for new development.
  • Web-based debugging: WebSim (http://localhost:8000/) provides visual monitoring of agent perception, decisions, and action commands for easier development and debugging.
  • Example agents & workflows: includes ready-made agents (e.g., Spot) that demonstrate webcam-based perception, captioning, LLM-driven action planning, and simulated actuation.
Architecture & Extensibility

OM1 is structured around configurable JSON5 agent definitions that tie inputs -> reasoning (LLM/VLM) -> actions. Hardware adapters implement a hardware abstraction layer (HAL) that accepts elemental commands such as move(x,y,z), pick_up(), or expressive actions (smile, wave). The repo provides examples for integrating with external SDKs (e.g., Unitree SDK) and guidance for creating Dockerized stacks and middleware bridges.

Getting Started (high level)
  1. Clone the repository and initialize submodules.
  2. Use the uv package manager to create the virtual environment and install dependencies.
  3. Configure an OpenMind API key (portal) and update the agent config (e.g., spot.json5) for ASR/TTS and model endpoints.
  4. Launch an example agent (e.g., uv run src/run.py spot) and inspect behavior in WebSim.

Detailed installation notes are included for macOS and Linux (portaudio, ffmpeg, etc.).

Robotics & Full Autonomy

OM1 supports full autonomy workflows when combined with companion repositories (unitree-sdk, om1-avatar, om1-video-processor). The project documents a four-service loop for Unitree robots that includes perception, SLAM/navigation (via unitree_sdk), UI/avatar front-end, and video processing services—demonstrating end-to-end autonomy examples.

Use Cases
  • Research & prototyping of multimodal robot agents that combine LLM reasoning with perception and actuation.
  • Educational platforms for teaching robotics and embodied AI using TurtleBot-like robots and simulators.
  • Production robot integrations where a modular runtime can be extended to hardware-specific SDKs and middleware.
Resources
License & Contribution

OM1 is licensed under the MIT License. Contributors are asked to follow the CONTRIBUTING.md guidance in the repo when submitting pull requests.

Information

  • Websitegithub.com
  • AuthorsOpenMind
  • Published date2025/01/08

More Items