Why this matters The field is shifting from lab-only scripted trials to messy, diverse real-world demonstrations; HIW-500 supplies long-horizon, whole-body teleoperation data captured inside real homes to let researchers train and evaluate policies on realistic clutter, lighting, and human operator variability.
What Sets It Apart
- Real-home whole-body teleop: human operators teleoperate a Unitree G1 to produce naturalistic mobile-manipulation and bimanual interaction trajectories rather than synthetic or lab-only runs, so learned policies see real distributional variation.
- Multimodal, synchronized traces: head/wrist video (RGB + IR stereo), 29-DoF joint/proprioceptive state, end-effector poses, IMU/odometry, action commands and per-episode language annotations — useful for V+L, imitation, hierarchical and skill discovery experiments.
- Scale + accessibility: V1 offers 500+ hours (≈23K episodes, 148K+ subtask annotations). Data is available as raw ROS bag/MCAP (~10+ TB) and re-encoded LeRobot (~2.15 TB) for easier streaming without changing trajectories.
Who It's For and Tradeoffs
Great fit if you develop or evaluate embodied/robotic learning methods that need in-the-wild diversity (imitation learning, hierarchical policies, multimodal perception-to-action models). The dataset is practical for groups that can handle multi-terabyte storage or stream LeRobot-formatted chunks. Look elsewhere if you need high-resolution visual data, non-Unitree platforms, or purely simulated benchmarks: HIW-500 is tied to Unitree G1 sensors and household tasks collected in Southeast Asian homes, so domain and hardware biases apply. The public release is CC BY 4.0, enabling research and many commercial uses but requiring attribution.
