Taxonomy

Topics

Topics cut across fields and help you follow specific problems, methods, and workflows such as RAG, tool use, efficient inference, or embodied agents.

LLM Reasoning

Papers about structured reasoning, proof solving, and long-chain problem solving.

63 papers

Recent picks

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

AVISE: Framework for Evaluating the Security of AI Systems

CHASM: Unveiling Covert Advertisements on Chinese Social Media

AI Agents

Agentic systems, multi-agent coordination, and task planning.

58 papers

Recent picks

Interval POMDP Shielding for Imperfect-Perception Agents

ActuBench: A Multi-Agent LLM Pipeline for Generation and Evaluation of Actuarial Reasoning Tasks

Stateless Decision Memory for Enterprise AI Agents

Efficient Inference

Latency, serving, cache efficiency, and practical inference speed.

27 papers

Recent picks

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling

Super Apriel: One Checkpoint, Many Speeds

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

Alignment & Safety

Alignment, preference learning, robustness, and safe deployment.

17 papers

Recent picks

SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models

Using large language models for embodied planning introduces systematic safety risks

Mind DeepResearch Technical Report

RAG

Retrieval-augmented generation systems, evaluation, and retrieval-heavy workflows.

14 papers

Recent picks

HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents

Model Compression

Quantization, pruning, distillation, and smaller deployment footprints.

8 papers

Recent picks

AAC: Admissible-by-Architecture Differentiable Landmark Compression for ALT

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

3D Vision

3D perception, reconstruction, neural rendering, and spatial reasoning.

7 papers

Recent picks

GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

APC: Transferable and Efficient Adversarial Point Counterattack for Robust 3D Point Cloud Recognition

Rethinking Patient Education as Multi-turn Multi-modal Interaction

Diffusion Models

Diffusion-based generation for images, video, and multimodal outputs.

7 papers

Recent picks

GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

Wan-Image: Pushing the Boundaries of Generative Visual Intelligence

EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation

Embodied Agents

Reasoning and action grounded in the physical world.

7 papers

Recent picks

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Using large language models for embodied planning introduces systematic safety risks

Multimodal Understanding

Cross-modal understanding across text, image, video, and audio.

4 papers

Recent picks

Multimodal Transformer for Sample-Aware Prediction of Metal-Organic Framework Properties

Cross-Modal Bayesian Low-Rank Adaptation for Uncertainty-Aware Multimodal Learning

MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents

Tool Use

Function calling, API integration, and tool-augmented model behavior.

4 papers

Recent picks

EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

Video Generation

Video synthesis, editing, and temporal generation systems.

4 papers

Recent picks

Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

Robot Manipulation

Embodied control and robot interaction with objects.

3 papers

Recent picks

$π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing

AnyUser: Translating Sketched User Intent into Domestic Robots

Vision-Language Models

Vision-language models that connect text and perception.

3 papers

Recent picks

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Do VLMs Truly "Read" Candlesticks? A Multi-Scale Benchmark for Visual Stock Price Forecasting

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

World Models

Representation learning for long-horizon decision making and planning.

3 papers

Recent picks

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Fine-tuning & PEFT

Adaptation methods such as LoRA, adapters, and lightweight fine-tuning.

2 papers

Recent picks

LACE: Lattice Attention for Cross-thread Exploration

Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning

Navigation

Movement, path planning, and spatial decision making.

1 papers

Recent picks

Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input