Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
Bo Li, Mingda Wang, Gexiang Fang, Shikun Zhang, Wei Ye
Recommendation Score
Research context
Topics
RAG
Paper type
Benchmark
Best for
Useful for both
arXiv categories
Why It Matters
GRIP turns retrieval into a native decoding action so the model can decide when to search, rewrite queries, and stop inside one reasoning trace instead of bolting on a controller.
Abstract
We revisit retrieval-augmented generation (RAG) by embedding retrieval control directly into generation. Instead of treating retrieval as an external intervention, we express retrieval decisions within token-level decoding, enabling end-to-end coordination without additional controllers or classifiers. Under the paradigm of Retrieval as Generation, we propose \textbf{GRIP} (\textbf{G}eneration-guided \textbf{R}etrieval with \textbf{I}nformation \textbf{P}lanning), a unified framework in which the model regulates retrieval behavior through control-token emission. Central to GRIP is \textit{Self-Triggered Information Planning}, which allows the model to decide when to retrieve, how to reformulate queries, and when to terminate, all within a single autoregressive trajectory. This design tightly couples retrieval and reasoning and supports dynamic multi-step inference with on-the-fly evidence integration. To supervise these behaviors, we construct a structured training set covering answerable, partially answerable, and multi-hop queries, each aligned with specific token patterns. Experiments on five QA benchmarks show that GRIP surpasses strong RAG baselines and is competitive with GPT-4o while using substantially fewer parameters.