Most robot learning datasets are collected in labs or with simulated platforms; real-home whole-body humanoid demonstrations at scale are rare. This LeRobot v3.0 export delivers 500+ hours of synchronized teleoperation recordings from Unitree G1s operating in 12 real homes, explicitly intended for imitation learning, mobile manipulation, bimanual skills and long-horizon household tasks.
What Sets It Apart
- Real-home whole-body teleoperation: human operators teleoperated Unitree G1s across diverse Southeast Asian homes, yielding natural variation in layouts, lighting, clutter and operator style — useful for robustness and domain generalization. This is demonstration data for control policies rather than scripted lab trials.
- Multimodal, time-synced observations: head and wrist cameras (480p @ 30 FPS), 29-DoF joint states, end-effector poses, IMU, odometry, action traces and language annotations. The dataset includes hierarchical subtask labels (161 subtasks, 148K+ annotations) to support hierarchical policy learning and evaluation.
- Scaled and packaged for research: ~23.7K episodes and ~40.84M frames; original raw corpus (~10+ TB) re-encoded to a LeRobot v3.0 release (~2.15 TB) for easier streaming and ML workflows (Parquet + chunked video). Distributed under CC BY 4.0 for research and commercial training with attribution.
Who it's for and tradeoffs
Great fit if you need large-scale, real-world humanoid demonstration data for imitation learning, multi-view perception, long-horizon task decomposition or transfer to embodied agents. The LeRobot format integrates with common ML stacks for streaming and policy training. Look elsewhere if you need: dense instrumented sensors not present here (e.g., high-res depth/Ti), extremely lightweight downloads (dataset is multi-terabyte even in LeRobot form), or non-teleoperated robot trajectories (this corpus is human-driven demonstrations on Unitree G1).
