High-quality, action-aligned egocentric video plus dense control signals at scale is rare; this dataset fills that gap by rendering professional Counter-Strike 2 replays into long, contiguous first-person clips synchronized with exact per-frame controls and pose.
What Sets It Apart
- Scale and temporal horizons: 600k+ player-round clips totaling 10k+ hours, with many clips spanning full rounds (60–90s), enabling models to learn long-horizon tactical structure rather than only short motion primitives.
- Dense, causally aligned controls: per-frame keyboard actions (W/A/S/D, jump, crouch, run, fire/secondary flags), mouse x/y deltas, world position (x/y/z), and camera yaw/pitch synchronized to 48fps video—ideal for action-conditioned video generation and interactive world models.
- Reproducible pipeline and convenient packaging: clips are rendered at 720p/48fps with HUD/weapon hidden, stored as ~2GB WebDataset tar shards (video + parquet sidecar) together with an index; an open-source renderer can reproduce or extend the corpus from .dem files.
- Multi-agent context: all players from matches are captured with shared round/map metadata, enabling causal, multi-agent analyses (how one agent’s actions change another’s view).
Who It's For — and Tradeoffs
Great fit if you need large-scale, visually rich synthetic-but-realistic egocentric data for: world modeling, action-conditioned video prediction, imitation learning, egocentric navigation priors, or multi-agent behavior studies. It is less suitable if you require: commercially licensed data (dataset is CC BY-NC 4.0), raw competitive demo sources (.dem) instead of rendered clips, or real-world camera noise/photorealism beyond the CS2 rendering style. Consider the provided renderer when you need custom maps, different visual settings, or expanded annotations.
