From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines

Sunkyung Lee, Jihye Back, Donghyeon Jeon, Soonhwan Kwon, Moonkwon Kim, Inho Kang, Jongwuk Lee

Recommendation Score

breakthrough🟡 IntermediateNLP RAGSystemBest for builders

Research context

Primary field

NLP

Language understanding, generation, extraction, and evaluation.

Topics

RAG

Paper type

System

Best for

Best for builders

arXiv categories

cs.IRcs.CLcs.IR

Why It Matters

Introduces authority-aware generation in retrieval, directly improving trustworthiness in high-stakes domains by biasing LLMs toward credible sources—not just relevance—enabling safer deployment in healthcare and finance.

Abstract

Generative information retrieval (GenIR) formulates the retrieval process as a text-to-text generation task, leveraging the vast knowledge of large language models. However, existing works primarily optimize for relevance while often overlooking document trustworthiness. This is critical in high-stakes domains like healthcare and finance, where relying solely on semantic relevance risks retrieving unreliable information. To address this, we propose an Authority-aware Generative Retriever (AuthGR), the first framework that incorporates authority into GenIR. AuthGR consists of three key components: (i) Multimodal Authority Scoring, which employs a vision-language model to quantify authority from textual and visual cues; (ii) a Three-stage Training Pipeline to progressively instill authority awareness into the retriever; and (iii) a Hybrid Ensemble Pipeline for robust deployment. Offline evaluations demonstrate that AuthGR successfully enhances both authority and accuracy, with our 3B model matching a 14B baseline. Crucially, large-scale online A/B tests and human evaluations conducted on the commercial web search platform confirm significant improvements in real-world user engagement and reliability.