AI Agent Papers2026

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Guides LLM-based agents to decompose long-horizon research problems and delegate subtasks to constrained subagents, then fine-tunes models on harness-generated trajectories so delegation decisions become internalized. Reports SearchSwarm-30B-A3B achieving top BrowseComp scores for its scale.

Visit Website

Introduction

Why this matters Large language models face a practical bottleneck: context windows are finite while real research requires chaining many decisions and external lookups over long horizons. The surprising gap is not compute but delegation intelligence — knowing when to split, what to delegate, and how to integrate concise returns so the main agent can continue without exhausting context.

Key Findings

Harness-guided trajectories: The authors design a harness that constrains subagents to return concise, structured summaries and steers the main agent toward high-quality task decomposition. Those guided trajectories encode correct delegation decisions suitable for supervised fine-tuning.
Supervised fine-tuning for delegation: By training on harness-generated examples, the resulting model internalizes when and how to delegate rather than relying on brittle at-inference orchestration.
Empirical gains: Their 30B model, SearchSwarm-30B-A3B, achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, reported as the best among comparable-scale models in their evaluation.
Practical workflow savings: Delegation reduces context load by having subagents perform focused searches and return summarized results, enabling longer multi-step research workflows without blowing the main agent’s context budget.

Who it fits and tradeoffs

Great fit if you need an LLM agent to run long-horizon, research-style tasks where (1) iterative web/search calls are required, (2) structured summaries from subagents suffice, and (3) you can fine-tune models with synthetic supervised trajectories. Look elsewhere if you require rich, unabridged subagent outputs (not summaries), guaranteed real-time interactivity with complex external tools, or if you cannot afford scale-30B models and their inference costs.

Method snapshot

The paper’s pipeline: (1) define a harness that enforces decomposition quality and summary-return constraints for subagents; (2) generate end-to-end trajectories where the main agent delegates, subagents execute constrained subtasks, and return structured summaries; (3) use those trajectories as supervised fine-tuning data so the main model learns delegation policies embedded in its weights. The authors plan to release the harness, training data, and model weights to help reproducibility and follow-up research.

Back

Information

Websitearxiv.org
AuthorsPu Ning, Quan Chen, Kun Tao, Xinyu Tang, Tianshu Wang, Qianggang Cao, Xinyu Kong, Zujie Wen, Zhiqiang Zhang, Jun Zhou
Published date2026/06/08

More Items

AI Dataset2026

K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

Hao Liang, Qihan Lin +6

Provides a curriculum-aligned knowledge graph extracted from Chinese K–12 textbooks and accompanying benchmarks and training data to evaluate and train educational LLMs. Releases a 23,640-question multi-select benchmark and a 7,335-sample graph-guided training corpus with multimodal VQA pairs and the full construction pipeline.

benchmark multimodal vision llm nlp+6

AI Agent Papers2026

AREX: Towards a Recursively Self-Improving Agent for Deep Research

Shuqi Lu, Chaofan Li +21Beijing Academy of Artificial Intelligence (BAAI)

Alternates targeted research and constraint-wise audits to recursively improve long-horizon answers: an inner loop gathers evidence and drafts solutions, an outer loop audits unresolved claims and launches focused follow-ups. Trains 4B dense and 122B-A10B MoE agents with long-horizon RL and agentic mid-training, outperforming comparable-scale baselines on multi-step research benchmarks.

agent-skills ai-agent RL reasoning llm+1

Reinforcement Learning Papers2026

Beyond Euclidean Clipping: Overcoming Exploration Collapse in LLM RL via Riemannian Isometric Policy Optimization

Zhicheng Cai, Xinyuan Guo +5

Proposes Riemannian Isometric Policy Optimization (RIPO) to fix exploration collapse in PPO-style RL for LLMs by aligning policy updates with the policy manifold's Riemannian geometry, improving exploration–exploitation balance and optimization stability across competition benchmarks.

RL LLM paper nlp reasoning+2