Large, focused gameplay corpora are valuable because noisy streaming footage (overlays, desktop, launchers) confounds model learning. This dataset supplies nearly 495 hours of gameplay-only clips trimmed to remove launchers, desktop screens, and viewing/streaming artifacts while keeping in-game menus, lobbies, loading, and cutscenes — making it directly usable for vision-action research and gameplay analysis.
What Sets It Apart
- Session-oriented layout: each workflow folder contains a 30fps H.264 clip.mp4 plus events.json (input/app events rebased to clip timeline), frame_events.json (per-frame event view), and a metadata.json summarizing duration and event counts — so you get aligned video+event traces out of the box.
- Wide game coverage with concentrated scale: 776 workflows across 168 distinct games totaling 494.7 hours; top titles include Valorant (102.2 h), Minecraft (41.3 h), and GTA V (34.7 h) — useful for both breadth and per-title depth.
- Trimmed to pure gameplay: non-gameplay content (launchers, desktop, watching/streaming) removed, reducing label noise for behavior cloning, imitation learning, and supervised vision-action training.
- Practical formats: 30fps CFR H.264 video and NDJSON event files enable straightforward ingestion with standard tools (datasets, pandas, polars) and simple conversion pipelines.
Who it's for — and tradeoffs
Great fit if you need medium-scale, gameplay-focused video + input traces for training vision-action models, behavior cloning, imitation learning, gameplay understanding, or dataset augmentation. The dataset's per-session organization and event alignment lower preprocessing overhead.
Look elsewhere if you require an explicit open license (the Hugging Face card lists no license), ultra-high frame-rate / hardware-synchronized HID traces, or full desktop recordings (this collection is trimmed to in-game footage). Also note the dataset is Windows-heavy (~489.3h) with limited macOS coverage (~5.3h), which may bias platform-specific behaviors.
