Taxonomy

Topics

Topics cut across fields and help you follow specific problems, methods, and workflows such as RAG, tool use, efficient inference, or embodied agents.

AI Agents

Agentic systems, multi-agent coordination, and task planning.

37 papers

Recent picks

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents

FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

LLM Reasoning

Papers about structured reasoning, proof solving, and long-chain problem solving.

23 papers

Recent picks

Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

Decomposing and Reducing Hidden Measurement Error in LLM Evaluation Pipelines

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

Efficient Inference

Latency, serving, cache efficiency, and practical inference speed.

10 papers

Recent picks

Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents

MEMENTO: Teaching LLMs to Manage Their Own Context

KV Cache Offloading for Context-Intensive Tasks

Alignment & Safety

Alignment, preference learning, robustness, and safe deployment.

8 papers

Recent picks

ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment

RAG

Retrieval-augmented generation systems, evaluation, and retrieval-heavy workflows.

6 papers

Recent picks

Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

Tool Use

Function calling, API integration, and tool-augmented model behavior.

4 papers

Recent picks

EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

Embodied Agents

Reasoning and action grounded in the physical world.

3 papers

Recent picks

ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

Video Generation

Video synthesis, editing, and temporal generation systems.

3 papers

Recent picks

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

Physics-Aware Video Instance Removal Benchmark

3D Vision

3D perception, reconstruction, neural rendering, and spatial reasoning.

2 papers

Recent picks

Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction

Diffusion Models

Diffusion-based generation for images, video, and multimodal outputs.

2 papers

Recent picks

SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Model Compression

Quantization, pruning, distillation, and smaller deployment footprints.

2 papers

Recent picks

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

Don't Waste Bits! Adaptive KV-Cache Quantization for Lightweight On-Device LLMs

Robot Manipulation

Embodied control and robot interaction with objects.

2 papers

Recent picks

PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing

AnyUser: Translating Sketched User Intent into Domestic Robots

Multimodal Understanding

Cross-modal understanding across text, image, video, and audio.

1 papers

Recent picks

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Vision-Language Models

Vision-language models that connect text and perception.

1 papers

Recent picks

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

World Models

Representation learning for long-horizon decision making and planning.

1 papers

Recent picks

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms