LogoAIAny
Icon for item

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Synthesizes shortcut-resistant search tasks to train deep search agents by controlling four shortcut risks across entity selection, evidence-graph construction, question formulation, and adversarial refinement. Produces training trajectories with longer pre-answer search and fewer shortcut patterns; code will be released on GitHub.

Introduction

Most synthetic search-task generators increase apparent difficulty by complicating graph structure, but models can still collapse the intended search process via cheaper identifying routes (shortcuts). FORT starts from the observation that structural complexity alone doesn't guarantee realized search difficulty and instead formalizes a shortcut-aware difficulty framework to diagnose and mitigate four concrete shortcut risks: evidence co-coverage, single-clue selectivity, exposed constants, and prior-knowledge binding.

Key Findings
  • FORT's shortcut-aware synthesis yields trajectories with longer pre-answer search costs and lower prior-shortcut rates than existing open-source deep-search datasets — so what: training signals better reflect the intended multi-step search process rather than superficial cues.
  • By controlling shortcuts at four stages (entity selection, evidence-graph construction, question formulation, adversarial refinement), FORT reduces exploitable patterns that let agents bypass search. This means learned agents must actually acquire evidence across steps to answer.
  • Using FORT-generated trajectories to supervise fine-tuning produced FORT-Searcher, which outperforms comparable-size open-source search agents on challenging deep-search benchmarks when trained with SFT only — so what: high-quality, shortcut-resistant synthetic data can substantially improve supervised training for search agents.
How it works (brief)

FORT instruments dataset synthesis with measurable trajectory signatures (solving cost, answer hit time, prior-shortcut rate) to detect realized shortcuts. It then applies targeted interventions during entity selection, evidence-graph creation, and question phrasing, followed by adversarial refinement to iteratively remove remaining shortcut patterns. The authors plan to release dataset generation code and FORT-Searcher code at RUCAIBox/FORT-Searcher on GitHub.

Who it's for & tradeoffs

Great fit if you are developing or evaluating deep search/QA agents and need training trajectories that enforce multi-step evidence acquisition rather than surface heuristics. Look elsewhere if your task requires naturalistic, human-generated question-answer pairs without synthetic control (FORT intentionally modifies generation to avoid shortcuts) or if you need methods focused on reinforcement learning rather than SFT-driven training.

Information

  • Websitearxiv.org
  • AuthorsJia Deng, Yimeng Chen, Xiaoqing Xiang, Ziyang Zeng, Shuo Tang, Wayne Xin Zhao, Feng Chang, Chuan Hao, Yuan Wei, Ran Tao, Bryan Dai, Ji-Rong Wen
  • Published date2026/06/10

Categories