AI Agent Papers2026

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Lets an AI agent propose, run, and evaluate multi-step research experiments using a persistent Hypothesis Tree that links hypotheses, artifacts, evidence, and distilled insights. Combines a long-lived coordinator with short-lived executors to carry lessons across time; evaluated on six ML tasks.

Visit Website

Introduction

Most attempts to automate experiments treat each run as a one-off. The core insight here is that autonomous research becomes substantially more productive when experiments, evidence, and distilled lessons are stored and used to guide future strategy — turning many local trials into a cumulative, long-horizon search.

Key Findings

Persistent Hypothesis Tree (HTR): Maintains linked hypotheses, artifacts, evidence, and distilled insights so successful lessons propagate to future experiments — this reduces repeated dead-ends and focuses search. (So what: saves compute and accelerates progress by reusing verified improvements.)
Two-tier architecture: a long-lived coordinator manages strategy over the tree while short-lived executors implement isolated worktrees for concrete tests. (So what: separates meta-level strategy from noisy experiment execution, improving fault isolation and reproducibility.)
Strong empirical gains under Autonomous Optimization: Across six real research tasks (training, harness engineering, data synthesis), the framework outperforms baseline agents and achieves substantially higher held-out gains. (So what: demonstrates practical benefit on end-to-end research-style workflows.)

Who it's for and tradeoffs

Great fit if you want an agentic system that can run iterative ML experiments with minimal step-level supervision and you care about accumulating reusable lessons over many trials. Look elsewhere if you need lightweight single-shot automation, have strict reproducibility constraints across external environments, or lack the compute/resources for long-horizon agent runs. The approach depends on careful orchestration (coordinator + executors), reliable experiment isolation, and the quality of underlying model-based evaluators.

How it works (brief)

Coordinator components manage global policy and update the Hypothesis Tree as results arrive; executors create isolated worktrees to implement and test individual hypotheses. HTR refines branches by admitting verified improvements, propagating distilled heuristics, and refining the search frontier, turning episodic attempts into cumulative progress. Evaluations use an Autonomous Optimization protocol and benchmarks (e.g., MLE-Bench Lite) to measure held-out gains.

Back

Information

Websitearxiv.org
AuthorsJiajie Jin, Yuyang Hu, Kai Qiu, Qi Dai, Chong Luo, Guanting Dong, Xiaoxi Li, Tong Zhao, Xiaolong Ma, Gongrui Zhang …
Published date2026/06/10

More Items

AI Agent Papers2026

AREX: Towards a Recursively Self-Improving Agent for Deep Research

Shuqi Lu, Chaofan Li +21Beijing Academy of Artificial Intelligence (BAAI)

Alternates targeted research and constraint-wise audits to recursively improve long-horizon answers: an inner loop gathers evidence and drafts solutions, an outer loop audits unresolved claims and launches focused follow-ups. Trains 4B dense and 122B-A10B MoE agents with long-horizon RL and agentic mid-training, outperforming comparable-scale baselines on multi-step research benchmarks.

agent-skills ai-agent RL reasoning llm+1

Large Language Model Papers2026

Scaling Laws for Hypernetwork-Based Knowledge Injection in Large Language Models

Nischay Dhankhar, Dos Baha +1

Studies train-time knowledge injection via hypernetworks that generate fixed LoRA adapters from large fact corpora, empirically characterizing power-law scaling across hypernetwork depth, width, and target model size and reporting improved OOD generalization.

LLM NLP paper lora reasoning+3

AI Agent Papers2026

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines

Runming He, Zhen Hao Wong +5

Guides an LLM agent to build persistent, editable DAG-based data pipelines via typed, incremental mutations instead of free-form scripts. Combines DataFlow-Skills, a Model Context Protocol exposing live operator registry and pipeline state, and a synchronized Web UI; achieves 93.3% end-to-end pass rate on a 12-task benchmark while cutting cost and latency versus script baselines.

agent-skills mcp ai-agent ai-workflow LLM+4

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Introduction

Key Findings

Who it's for and tradeoffs

How it works (brief)

Information

Categories

Tags

More Items

AREX: Towards a Recursively Self-Improving Agent for Deep Research

Scaling Laws for Hypernetwork-Based Knowledge Injection in Large Language Models

DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines