Automated data storytelling risks producing plausible but unverifiable claims. This work shows a practical way to automate journalism while keeping every factual element traceable: a multi-agent pipeline synthesizes statistics, narrative, and visuals and tags every sentence and asset with upstream evidence so claims can be re-executed and audited.
Key Findings
- A seven-role newsroom pipeline (Detective, Analyst, Editor, Designer, Programmer, Auditor, Inspector) breaks the task into specialised artifacts so each output is tagged with provenance — this makes the final article reproducible and easier to audit.
- The Inspector mechanism binds text, numbers, charts and assets to concrete evidence (data, code, external URLs) and supports automated re-execution checks — so what: readers and editors can verify claims programmatically, reducing trust friction in data-driven reporting.
- The system generates multimodal outputs (interactive maps, charts, audio/video where relevant) instead of static text-only pieces — so what: stories better match reader needs and data modalities, improving comprehension for geographically or media-rich topics.
- Empirical evaluation on 18 paired articles shows the agent pipeline excels at transparency and verifiability but lags human-authored pieces on editorial angle, creative design, and final presentation — so what: the system is a practical collaborator that augments journalistic workflows rather than replacing reporters.
Who it's for and tradeoffs
Great fit if you run a newsroom, data- journalism project, or research group that needs reproducible, evidence-grounded multimedia stories and can provide curated datasets and modest engineering resources. Look elsewhere if your priority is investigative reporting that requires deep human sources, nuanced editorial judgment, or bespoke creative design: the pipeline emphasizes verifiability and multimodal automation over editorial artistry. The approach also depends on LLMs, executable analysis code, and integration engineering, so expect implementation overhead and the usual limitations of model-driven generation.
