Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

Xiaozhe Li, Tianyi Lyu, Yizhao Yang, Liang Shan, Siyi Yang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu, Yang Li

Recommendation Score

significant🔴 AdvancedReasoning & Agents AI AgentsSystemUseful for both

Research context

Primary field

Reasoning & Agents

Reasoning, planning, tool use, and agentic workflows.

Topics

AI Agents

Paper type

System

Best for

Useful for both

arXiv categories

cs.AIcs.AI

Why It Matters

A small RL-trained ContextCurator learns to trim noisy history while preserving reasoning anchors, boosting long-horizon agents and slashing token use up to 8x.

Abstract

Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized policy model, ContextCurator, with a powerful frozen foundation model, TaskExecutor. Trained via reinforcement learning, ContextCurator actively reduces information entropy in the working memory. It aggressively prunes environmental noise while preserving reasoning anchors, that is, sparse data points that are critical for future deductions. On WebArena, our framework improves the success rate of Gemini-3.0-flash from 36.4% to 41.2% while reducing token consumption by 8.8% (from 47.4K to 43.3K). On DeepSearch, it achieves a 57.1% success rate, compared with 53.9%, while reducing token consumption by a factor of 8. Remarkably, a 7B ContextCurator matches the context management performance of GPT-4o, providing a scalable and computationally efficient paradigm for autonomous long-horizon agents.

More in Reasoning & Agents → More on AI Agents →

View on arXiv → Download PDF →

Published April 13, 2026