Taxonomy
Topics
Topics cut across fields and help you follow specific problems, methods, and workflows such as RAG, tool use, efficient inference, or embodied agents.
LLM Reasoning
Papers about structured reasoning, proof solving, and long-chain problem solving.
Recent picks
Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure
AVISE: Framework for Evaluating the Security of AI Systems
CHASM: Unveiling Covert Advertisements on Chinese Social Media
AI Agents
Agentic systems, multi-agent coordination, and task planning.
Recent picks
Interval POMDP Shielding for Imperfect-Perception Agents
ActuBench: A Multi-Agent LLM Pipeline for Generation and Evaluation of Actuarial Reasoning Tasks
Stateless Decision Memory for Enterprise AI Agents
Efficient Inference
Latency, serving, cache efficiency, and practical inference speed.
Recent picks
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling
Super Apriel: One Checkpoint, Many Speeds
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
Alignment & Safety
Alignment, preference learning, robustness, and safe deployment.
Recent picks
SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models
Using large language models for embodied planning introduces systematic safety risks
Mind DeepResearch Technical Report
RAG
Retrieval-augmented generation systems, evaluation, and retrieval-heavy workflows.
Recent picks
HaS: Accelerating RAG through Homology-Aware Speculative Retrieval
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents
Model Compression
Quantization, pruning, distillation, and smaller deployment footprints.
Recent picks
AAC: Admissible-by-Architecture Differentiable Landmark Compression for ALT
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling
3D Vision
3D perception, reconstruction, neural rendering, and spatial reasoning.
Recent picks
GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers
APC: Transferable and Efficient Adversarial Point Counterattack for Robust 3D Point Cloud Recognition
Rethinking Patient Education as Multi-turn Multi-modal Interaction
Diffusion Models
Diffusion-based generation for images, video, and multimodal outputs.
Recent picks
GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers
Wan-Image: Pushing the Boundaries of Generative Visual Intelligence
EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation
Embodied Agents
Reasoning and action grounded in the physical world.
Recent picks
JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance
Using large language models for embodied planning introduces systematic safety risks
Multimodal Understanding
Cross-modal understanding across text, image, video, and audio.
Recent picks
Multimodal Transformer for Sample-Aware Prediction of Metal-Organic Framework Properties
Cross-Modal Bayesian Low-Rank Adaptation for Uncertainty-Aware Multimodal Learning
MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents
Tool Use
Function calling, API integration, and tool-augmented model behavior.
Recent picks
EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
Video Generation
Video synthesis, editing, and temporal generation systems.
Recent picks
Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
Robot Manipulation
Embodied control and robot interaction with objects.
Recent picks
$π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing
AnyUser: Translating Sketched User Intent into Domestic Robots
Vision-Language Models
Vision-language models that connect text and perception.
Recent picks
PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance
Do VLMs Truly "Read" Candlesticks? A Multi-Scale Benchmark for Visual Stock Price Forecasting
E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes
World Models
Representation learning for long-horizon decision making and planning.
Recent picks
Cortex 2.0: Grounding World Models in Real-World Industrial Deployment
Mask World Model: Predicting What Matters for Robust Robot Policy Learning
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
Fine-tuning & PEFT
Adaptation methods such as LoRA, adapters, and lightweight fine-tuning.
Recent picks
LACE: Lattice Attention for Cross-thread Exploration
Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning
Navigation
Movement, path planning, and spatial decision making.
Recent picks
Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input