Researcher Mode
Explore by field and topic
Jump straight into the slice of AI research you care about. The taxonomy layer turns the daily feed into a browsable map: fields for broad domains, topics for recurring research questions, and paper-level tags for faster triage.
Landmark guides
Long-arc reading paths for understanding a field, not just today’s feed.
Landmark Guide
12 Papers That Built Modern LLMs
A beginner-friendly map of the papers that shaped transformers, scaling, alignment, and the open-source LLM era.
Landmark Guide
12 Papers That Shaped Modern AI Agents
A beginner-friendly map of the ideas behind tool use, planning, memory, multi-agent workflows, and software agents.
Landmark Guide
12 Papers That Shaped Modern RAG
A beginner-friendly map of the ideas behind dense retrieval, retrieval-augmented generation, self-correction, and structured retrieval.
Landmark Guide
12 Papers That Shaped Modern Computer Use Agents
A beginner-friendly map of the papers behind web agents, GUI grounding, smartphone control, and full computer-use benchmarks.
Landmark Guide
12 Papers That Shaped Modern AI Coding Agents
A beginner-friendly map of code LLMs, repo grounding, software-engineering benchmarks, and modern SWE agents.
Active fields
5
Tracked topics
15
Papers in latest available release
15
Fields
Broad domains for navigating the archive at a glance.
Reasoning & Agents
Reasoning, planning, tool use, and agentic workflows.
Recent picks
UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents
FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
NLP
Language understanding, generation, extraction, and evaluation.
Recent picks
Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo
Machine Learning
Core modeling, optimization, inference, and systems efficiency.
Recent picks
Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents
MEMENTO: Teaching LLMs to Manage Their Own Context
KV Cache Offloading for Context-Intensive Tasks
Computer Vision
Image, video, and 3D perception plus visual generation.
Recent picks
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation
Robotics
Embodied systems, control, manipulation, and navigation.
Recent picks
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing
E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
Topics
Recurring problems and methods worth following over time.
AI Agents
Agentic systems, multi-agent coordination, and task planning.
Recent picks
UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents
FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
LLM Reasoning
Papers about structured reasoning, proof solving, and long-chain problem solving.
Recent picks
Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
Decomposing and Reducing Hidden Measurement Error in LLM Evaluation Pipelines
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
Efficient Inference
Latency, serving, cache efficiency, and practical inference speed.
Recent picks
Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents
MEMENTO: Teaching LLMs to Manage Their Own Context
KV Cache Offloading for Context-Intensive Tasks
Alignment & Safety
Alignment, preference learning, robustness, and safe deployment.
Recent picks
ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
RAG
Retrieval-augmented generation systems, evaluation, and retrieval-heavy workflows.
Recent picks
Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo
RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval
Tool Use
Function calling, API integration, and tool-augmented model behavior.
Recent picks
EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
Embodied Agents
Reasoning and action grounded in the physical world.
Recent picks
ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
Video Generation
Video synthesis, editing, and temporal generation systems.
Recent picks
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
Physics-Aware Video Instance Removal Benchmark
Latest highlighted papers
Today's arXiv batch has not landed yet. Showing the latest available release from Tuesday, April 14, 2026.
Yijuan Liang, Xinghao Chen, Yifan Ge et al.
A unified 22k-tool, 390k-example tool-use stack that standardizes data and evaluation and lets an 8B model beat major commercial models on hard distractor-heavy calling.
Haoran Ding, Zhaoguo Wang, Haibo Chen
This brings Hoare-style reasoning to 143k-line systems by inferring specs from caller intent, surfacing 522 new bugs in already-tested codebases.
Xiaomeng Hu, Yinger Zhang, Fei Huang et al.
OccuBench is a 100-scenario benchmark for professional agents across 65 domains that also injects hidden environment faults, exposing how brittle frontier models still are in real work settings.
Jinhua Wang, Biswa Sengupta
This benchmark-driven translation of a production AI coding agent from Rust to Python shows how LLMs can migrate large systems continuously while staying competitive on real agent benchmarks.
CocoaBench Team, Shibo Hao, Zhining Zhang et al.
CocoaBench is a strong reality check for unified digital agents, with long-horizon tasks that force systems to combine vision, search, and coding in one workflow.
Liujie Zhang, Benzhe Ning, Rui Yang et al.
Relax is an open asynchronous RL engine for omni-modal post-training that doubles throughput on Qwen3-Omni-scale runs without sacrificing convergence.