Most agent workflows react only to explicit user requests and therefore miss many useful but latent problems in a broader context. TIDE reframes this as a proactive multi-problem discovery task and shows that combining iterative rounds of candidate generation with reusable thought templates uncovers coexisting issues more comprehensively and grounds each discovery in evidence and suggested actions.
Key Findings
- Iterative discovery: generating a small batch per round while conditioning on previously found items reduces repetition and extends coverage, so agents find more distinct problems than single-shot predictors.
- Thought templates: distilled, reusable schemas guide what contextual signals to attend to and how to connect them, so individual discoveries are anchored in recognizable problem classes rather than vague claims.
- Empirical validation: evaluated on two realistic settings (personal workspaces and software repositories) across four model backbones, TIDE outperforms single-shot and parallel multi-agent baselines on task coverage, identification accuracy, and actionable resolution rates.
Who it's for and tradeoffs
Great fit if you build agent systems that should proactively surface latent issues (e.g., assistant features over documents, codebases, or toolchains) and need discoveries to be evidence-backed and operational (paired with actions). Look elsewhere if your environment demands strict, deterministic rule-based alerts (TIDE relies on learned/backed LLM behavior) or if latency per discovery must be minimal—iterative rounds increase compute and interaction steps compared with a single-shot pass.
Method overview
TIDE combines two complementary mechanisms: (1) iterative candidate generation that conditions each round on previously accepted discoveries to prioritize novelty and coverage; and (2) thought templates—reusable, distilled schemas that specify what signals to check and how to connect them, which reduce generic, unsupported claims and make outputs easier to interpret and act on. The paper includes analyses across model backbones and application domains to demonstrate practical gains and typical failure modes.
