Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

Khushal Sethi

Recommendation Score

significant🟡 IntermediateReasoning & Agents AI AgentsBenchmarkUseful for both

Research context

Primary field

Reasoning & Agents

Reasoning, planning, tool use, and agentic workflows.

Topics

AI Agents

Paper type

Benchmark

Best for

Useful for both

arXiv categories

cs.AIcs.CLcs.MAcs.AI

Why It Matters

TrACE spends extra rollouts only on uncertain agent steps, matching fixed self-consistency accuracy with far fewer model calls and offering an easy path to cheaper agent inference.

Abstract

Inference-time compute scaling has emerged as a powerful technique for improving the reliability of large language model (LLM) agents, but existing methods apply compute uniformly: every decision step receives the same budget regardless of its difficulty. We introduce TrACE (Trajectorical Adaptive Compute via agrEement), a training-free controller that allocates LLM calls adaptively across agent timesteps by measuring inter-rollout action agreement. At each step, TrACE samples a small set of candidate next actions and measures how consistently the model commits to the same action. High agreement signals an easy decision; the controller commits immediately. Low agreement signals uncertainty; the controller samples additional rollouts up to a configurable cap before committing to the plurality action. No learned components, no external verifier, and no human labels are required. We evaluate TrACE against greedy decoding and fixed-budget self-consistency (SC-4, SC-8) on two benchmarks spanning single-step reasoning (GSM8K, n=50) and multi-step household navigation (MiniHouse, n=30), using a Qwen 2.5 3B Instruct model running on CPU. TrACE-4 matches SC-4 accuracy while using 33% fewer LLM calls on GSM8K and 39% fewer on MiniHouse. TrACE-8 matches SC-8 accuracy with 55% fewer calls on GSM8K and 65% fewer on MiniHouse. We further show that inter-rollout agreement is a reliable signal of step-level success, validating the core hypothesis that the model's own output consistency encodes difficulty information that can be exploited without training. TrACE is the first training-free, per-timestep adaptive-compute controller for LLM agents to be evaluated on multi-step sequential decision tasks.

More in Reasoning & Agents → More on AI Agents →

View on arXiv → Download PDF →

Published April 9, 2026