Adds interleaved text–image generation to existing image generators via a multi-agent pipeline: a planner sequences stepwise instructions, a critic detects and refines failures, and single-step RL (GRPO) reinforces per-step corrections—suited for visual narratives and embodied guidance.