Taxonomy
Topics
Topics cut across fields and help you follow specific problems, methods, and workflows such as RAG, tool use, efficient inference, or embodied agents.
AI Agents
Agentic systems, multi-agent coordination, and task planning.
Recent picks
UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents
FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
LLM Reasoning
Papers about structured reasoning, proof solving, and long-chain problem solving.
Recent picks
Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
Decomposing and Reducing Hidden Measurement Error in LLM Evaluation Pipelines
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
Efficient Inference
Latency, serving, cache efficiency, and practical inference speed.
Recent picks
Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents
MEMENTO: Teaching LLMs to Manage Their Own Context
KV Cache Offloading for Context-Intensive Tasks
Alignment & Safety
Alignment, preference learning, robustness, and safe deployment.
Recent picks
ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
RAG
Retrieval-augmented generation systems, evaluation, and retrieval-heavy workflows.
Recent picks
Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo
RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval
Tool Use
Function calling, API integration, and tool-augmented model behavior.
Recent picks
EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
Embodied Agents
Reasoning and action grounded in the physical world.
Recent picks
ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
Video Generation
Video synthesis, editing, and temporal generation systems.
Recent picks
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
Physics-Aware Video Instance Removal Benchmark
3D Vision
3D perception, reconstruction, neural rendering, and spatial reasoning.
Recent picks
Less Detail, Better Answers: Degradation-Driven Prompting for VQA
Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction
Diffusion Models
Diffusion-based generation for images, video, and multimodal outputs.
Recent picks
SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
Model Compression
Quantization, pruning, distillation, and smaller deployment footprints.
Recent picks
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
Don't Waste Bits! Adaptive KV-Cache Quantization for Lightweight On-Device LLMs
Robot Manipulation
Embodied control and robot interaction with objects.
Recent picks
PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing
AnyUser: Translating Sketched User Intent into Domestic Robots
Multimodal Understanding
Cross-modal understanding across text, image, video, and audio.
Recent picks
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
Vision-Language Models
Vision-language models that connect text and perception.
Recent picks
E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
World Models
Representation learning for long-horizon decision making and planning.
Recent picks
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models