Most robot RL stacks assume a GPU-dominant workflow where both simulation and policy compute live on accelerators. UniLab flips that assumption: it treats physics simulation as a CPU-parallel workload and moves sampled transitions into a shared memory buffer for accelerator-side policy learning. That separation lets large-scale robot RL runs scale simulation with many CPU threads while keeping training efficient on CUDA/MPS/ROCm/XPU devices.
What Sets It Apart
- Heterogeneous runtime architecture: decouples CPU-based physics (MuJoCoUni, MotrixSim) from policy learners running on accelerators via a Unified Shared Memory SharedReplayBuffer — so simulation throughput scales with CPU cores while training remains accelerator-optimized.
- Backend-agnostic task owners and Hydra-driven configs: tasks, rewards, backends and algorithms are selected via owner YAMLs, making it simple to switch between mujoco/motrix and algorithms (PPO, APPO, SAC, TD3, FlashSAC, HORA/HIM-PPO) with consistent CLI patterns.
- Cross-platform accelerator support and tooling: documented flows for Linux (CUDA/ROCm/XPU) and macOS (Apple Silicon/MPS), plus a unified CLI (uv) and demo/playback commands that download checkpoints from Hugging Face on first run.
- Focus on robot use-cases: includes demo presets (dance, wallflip, inhandgrasp, locomani) and task-specific optimizations (grasp caches, motion tracking) rather than a generic RL library.
Who It's For & Trade-offs
Great fit if you need to run many parallel physics streams for robot/embodied RL but want to keep training on modern accelerators — e.g., labs with multi-core servers that prefer CPU physics backends or those evaluating MotrixSim/MuJoCoUni for contact-rich tasks. It also helps when simulation licensing or acceleration constraints make GPU-based simulation impractical. Look elsewhere if you require turnkey single-process GPU simulation–training pipelines (where simulator and learner live on the same GPU), or if you need a minimal, dependency-light RL framework: UniLab expects a nontrivial setup (sim backends, acceleration drivers, uv tooling) and targets research/engineering teams rather than absolute beginners.
Where It Fits
Positioned between simulator ecosystems (MuJoCo, MotrixSim) and policy training frameworks: UniLab is an orchestration and runtime architecture that integrates physics backends with established RL algorithms and accelerator toolchains. It complements accelerator-first simulators by offering a CPU-dominated simulation path that can improve utilization on CPU-heavy clusters.
Practical notes
The repo uses a unified CLI (uv) for demos, train/eval workflows and relies on hosted assets (Hugging Face) for demo caches and pre-trained checkpoints. The project provides extensive docs and a paper (arXiv:2605.30313) for the architecture and evaluations.
