AIAny - Gaming Dataset (gaming-1)

Large, focused gameplay corpora are valuable because noisy streaming footage (overlays, desktop, launchers) confounds model learning. This dataset supplies nearly 495 hours of gameplay-only clips trimmed to remove launchers, desktop screens, and viewing/streaming artifacts while keeping in-game menus, lobbies, loading, and cutscenes — making it directly usable for vision-action research and gameplay analysis.

What Sets It Apart

Session-oriented layout: each workflow folder contains a 30fps H.264 clip.mp4 plus events.json (input/app events rebased to clip timeline), frame_events.json (per-frame event view), and a metadata.json summarizing duration and event counts — so you get aligned video+event traces out of the box.
Wide game coverage with concentrated scale: 776 workflows across 168 distinct games totaling 494.7 hours; top titles include Valorant (102.2 h), Minecraft (41.3 h), and GTA V (34.7 h) — useful for both breadth and per-title depth.
Trimmed to pure gameplay: non-gameplay content (launchers, desktop, watching/streaming) removed, reducing label noise for behavior cloning, imitation learning, and supervised vision-action training.
Practical formats: 30fps CFR H.264 video and NDJSON event files enable straightforward ingestion with standard tools (datasets, pandas, polars) and simple conversion pipelines.

Who it's for — and tradeoffs

Great fit if you need medium-scale, gameplay-focused video + input traces for training vision-action models, behavior cloning, imitation learning, gameplay understanding, or dataset augmentation. The dataset's per-session organization and event alignment lower preprocessing overhead.

Look elsewhere if you require an explicit open license (the Hugging Face card lists no license), ultra-high frame-rate / hardware-synchronized HID traces, or full desktop recordings (this collection is trimmed to in-game footage). Also note the dataset is Windows-heavy (~489.3h) with limited macOS coverage (~5.3h), which may bias platform-specific behaviors.

Gaming Dataset (gaming-1)

Introduction

What Sets It Apart

Who it's for — and tradeoffs

Information

Categories

Tags

More Items

olmOCR-bench

Vāgdhenu — Sanskrit Chant Corpus

AFTER