Assesses whether coding agents can generate complete, playable games end-to-end inside the Godot engine. Implements an interaction-grounded evaluation (replayed demonstrations + rubric-guided multimodal judging) across 140 tasks and 15 game families; top agents score ~41%.