AIAny - SceneFun3D

Introduction

Most 3D scene datasets label whole objects, but agents need the small interactive parts and how to manipulate them. SceneFun3D closes that gap by pairing point-accurate 3D masks of interactive elements with affordance labels, motion parameters, and natural-language task descriptions across high-fidelity scans.

What Sets It Apart

Fine-grained, manipulation-focused annotations: 14,867 interactive-element annotations across 710 Faro laser-scan scenes, annotated as point-index masks and exported as 3D detections.
Motion + affordance + language: each element carries a Gibsonian affordance (9 classes), motion type (translational/rotational), axis/origin vectors, and free-form task descriptions (10,913 elements with descriptions; 17,133 with rephrasings).
Multimodal alignment: per-scene high-res iPad RGB-D recordings (hi-res RGB, depth, poses, intrinsics) with the 3D elements projected into video frames; provided as a FiftyOne FO3D grouped dataset for visualization and benchmarking.
Benchmark focus: introduces three tasks—functionality segmentation, task-driven affordance grounding, and 3D motion estimation—targeting robotics and embodied-AI manipulation research.

Who it's for and tradeoffs

Great fit if you build or evaluate robotics, embodied-AI, or vision models that must localize tiny interactive parts and predict how to act on them (e.g., pick-and-place, manipulation target selection, action grounding from language). Look elsewhere if you need large-scale object-level semantics only, require a commercial license (SceneFun3D inherits CC BY‑NC‑SA), or cannot handle high-resolution Faro scans and cross-modal registration workflow. Practical limitations include withheld test annotations for benchmark use, excluded poorly-captured reflective elements, axis-aligned 3D boxes (no oriented box estimates), and nontrivial storage/registration needs for laser-scan + iPad assets.

SceneFun3D

Introduction

What Sets It Apart

Who it's for and tradeoffs

Information

Categories

Tags

More Items

xlangai/osworld_v2_tasks

CS2-10k

WGO-Bench