AI Research Highlights

Wednesday, April 8, 2026

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

David Picard, Nicolas Dufour, Lucas Degeorge et al.

breakthrough🔴 AdvancedMachine Learning Efficient Inference

cs.CVcs.AIcs.CV

PoM replaces attention with a linear-time polynomial mixer, maintaining universal approximation while slashing compute—game-changing for scaling vision and language models on edge devices.

Details → arXiv →

CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments

Gustav Keppler, Moritz Gstür, Veit Hagenmeyer

breakthrough🔴 AdvancedReasoning & Agents LLM Reasoning

cs.CRcs.AIcs.CR

CritBench is the first benchmark evaluating LLM agents on OT protocols like IEC 61850, exposing critical cybersecurity gaps in industrial systems. Essential for deploying LLMs in critical infrastructure safely.

Details → arXiv →

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Bowen Ye, Rang Li, Qibin Yang et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents Alignment & Safety

cs.AIcs.AI

Claw-Eval introduces transparent, safety-aware, multimodal evaluation for autonomous agents, addressing critical gaps in benchmarking—essential for building trustworthy, real-world AI agents.

Details → arXiv →

The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models

Xiaojie Gu, Ziying Huang, Weicong Hong et al.

breakthrough🔴 AdvancedNLP LLM Reasoning Alignment & Safety

cs.CLcs.AIcs.LG

Exposes how LLMs mimic edits without true memory updates, revealing dangerous surface compliance—vital for builders deploying knowledge-editing tools where factual reliability is non-negotiable.

Details → arXiv →

Mechanistic Circuit-Based Knowledge Editing in Large Language Models

Tianyi Zhao, Yinhan He, Wendy Zheng et al.

significant🔴 AdvancedNLP LLM Reasoning

cs.CLcs.CL

MCircKE mechanistically edits LLM knowledge to fix reasoning gaps, ensuring edited facts propagate in multi-step chains for reliable deployments.

Details → arXiv →

Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

Eranga Bandara, Ross Gore, Sachin Shetty et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.AI

Agentic AI automates end-to-end retail supply chains with real-world coordination—reduces manual labor at scale, proving LLM agents can drive high-stakes, operational workflows reliably.

Details → arXiv →

CoStream: Codec-Guided Resource-Efficient System for Video Streaming Analytics

Yulin Zou, Yan Chen, Wenyan Chen et al.

breakthrough🟡 IntermediateMachine Learning Efficient Inference

cs.DCcs.CVcs.LG

CoStream jointly optimizes video codec and multimodal inference to cut computational costs by 40%+—enabling scalable, real-time video analytics without sacrificing accuracy on vision-language models.

Details → arXiv →

In-Place Test-Time Training

Guhao Feng, Shengjie Luo, Kai Hua et al.

breakthrough🔴 AdvancedMachine Learning Efficient Inference

cs.LGcs.AIcs.CL

In-Place Test-Time Training enables LLMs to adapt weights during inference, overcoming static deployment limits—vital for real-time systems needing continuous learning from streaming data without retraining.

Details → arXiv →

LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo et al.

breakthrough🔴 AdvancedNLP LLM Reasoning

cs.CRcs.AIcs.CR

LLM4CodeRE adapts LLMs specifically for malware decompilation, significantly improving reverse engineering accuracy on obfuscated code—critical for automated threat analysis in cybersecurity operations.

Details → arXiv →

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

Maria Nesterova, Mikhail Kolosov, Anton Andreychuk et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

A single GPT-based model learns diverse MARL tasks, eliminating task-specific architectures—enabling scalable, generalizable multi-agent systems without retraining for each environment.

Details → arXiv →

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

Nirajan Acharya, Gaurav Kumar Gupta

breakthrough🔴 AdvancedReasoning & Agents AI Agents Tool Use

cs.CRcs.AIcs.CR

First formal security framework for MCP-based AI agents, defining threats and verifiable defenses. Essential for builders deploying LLM agents with external tool access in production environments.

Details → arXiv →

SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

Hiba Dahmani, Nathan Piasco, Moussab Bennehar et al.

breakthrough🔴 AdvancedComputer Vision Diffusion Models

cs.CVcs.CV

SEM-ROVER enables scalable, geometrically coherent 3D driving scene generation via semantic voxel-guided diffusion—enabling realistic, large-scale simulation for autonomous driving systems without view limitations.

Details → arXiv →

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

Wang Yang, Chaoda Song, Xinpeng Li et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.CLcs.AI

ACE-Bench reduces agent evaluation overhead by 41% with controllable, scalable tasks—enabling reliable, repeatable benchmarking of LLM agents for real-world deployment.

Details → arXiv →

Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment

Renxuan Tan, Rongpeng Li, Zhifeng Zhao et al.

breakthrough🔴 AdvancedNLP Alignment & Safety LLM Reasoning

cs.AIcs.AI

Introduces Pareto-lenient consensus to avoid premature convergence in multi-preference LLM alignment—enables robust, nuanced value alignment without sacrificing performance on conflicting human preferences.

Details → arXiv →

Physics-Aware Video Instance Removal Benchmark

Zirui Li, Xinghao Chen, Lingyu Jiang et al.

breakthrough🔴 AdvancedComputer Vision Video Generation

cs.CVcs.CV

PVIR introduces the first physics-aware benchmark for video object removal, forcing models to preserve physical consistency like shadows and reflections—critical for realistic video editing in production systems.

Details → arXiv →