Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

Bo Li, Mingda Wang, Gexiang Fang, Shikun Zhang, Wei Ye

Recommendation Score

significant🔴 AdvancedNLP RAGBenchmarkUseful for both

Research context

Primary field

NLP

Language understanding, generation, extraction, and evaluation.

Topics

RAG

Paper type

Benchmark

Best for

Useful for both

arXiv categories

cs.CLcs.AIcs.CL

Why It Matters

GRIP turns retrieval into a native decoding action so the model can decide when to search, rewrite queries, and stop inside one reasoning trace instead of bolting on a controller.

Abstract

We revisit retrieval-augmented generation (RAG) by embedding retrieval control directly into generation. Instead of treating retrieval as an external intervention, we express retrieval decisions within token-level decoding, enabling end-to-end coordination without additional controllers or classifiers. Under the paradigm of Retrieval as Generation, we propose \textbf{GRIP} (\textbf{G}eneration-guided \textbf{R}etrieval with \textbf{I}nformation \textbf{P}lanning), a unified framework in which the model regulates retrieval behavior through control-token emission. Central to GRIP is \textit{Self-Triggered Information Planning}, which allows the model to decide when to retrieve, how to reformulate queries, and when to terminate, all within a single autoregressive trajectory. This design tightly couples retrieval and reasoning and supports dynamic multi-step inference with on-the-fly evidence integration. To supervise these behaviors, we construct a structured training set covering answerable, partially answerable, and multi-hop queries, each aligned with specific token patterns. Experiments on five QA benchmarks show that GRIP surpasses strong RAG baselines and is competitive with GPT-4o while using substantially fewer parameters.