AI Research Highlights
Wednesday, April 8, 2026
David Picard, Nicolas Dufour, Lucas Degeorge et al.
PoM replaces attention with a linear-time polynomial mixer, maintaining universal approximation while slashing compute—game-changing for scaling vision and language models on edge devices.
Gustav Keppler, Moritz Gstür, Veit Hagenmeyer
CritBench is the first benchmark evaluating LLM agents on OT protocols like IEC 61850, exposing critical cybersecurity gaps in industrial systems. Essential for deploying LLMs in critical infrastructure safely.
Bowen Ye, Rang Li, Qibin Yang et al.
Claw-Eval introduces transparent, safety-aware, multimodal evaluation for autonomous agents, addressing critical gaps in benchmarking—essential for building trustworthy, real-world AI agents.
Xiaojie Gu, Ziying Huang, Weicong Hong et al.
Exposes how LLMs mimic edits without true memory updates, revealing dangerous surface compliance—vital for builders deploying knowledge-editing tools where factual reliability is non-negotiable.
Tianyi Zhao, Yinhan He, Wendy Zheng et al.
MCircKE mechanistically edits LLM knowledge to fix reasoning gaps, ensuring edited facts propagate in multi-step chains for reliable deployments.
Eranga Bandara, Ross Gore, Sachin Shetty et al.
Agentic AI automates end-to-end retail supply chains with real-world coordination—reduces manual labor at scale, proving LLM agents can drive high-stakes, operational workflows reliably.
Yulin Zou, Yan Chen, Wenyan Chen et al.
CoStream jointly optimizes video codec and multimodal inference to cut computational costs by 40%+—enabling scalable, real-time video analytics without sacrificing accuracy on vision-language models.
Guhao Feng, Shengjie Luo, Kai Hua et al.
In-Place Test-Time Training enables LLMs to adapt weights during inference, overcoming static deployment limits—vital for real-time systems needing continuous learning from streaming data without retraining.
Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo et al.
LLM4CodeRE adapts LLMs specifically for malware decompilation, significantly improving reverse engineering accuracy on obfuscated code—critical for automated threat analysis in cybersecurity operations.
Maria Nesterova, Mikhail Kolosov, Anton Andreychuk et al.
A single GPT-based model learns diverse MARL tasks, eliminating task-specific architectures—enabling scalable, generalizable multi-agent systems without retraining for each environment.
Nirajan Acharya, Gaurav Kumar Gupta
First formal security framework for MCP-based AI agents, defining threats and verifiable defenses. Essential for builders deploying LLM agents with external tool access in production environments.
Hiba Dahmani, Nathan Piasco, Moussab Bennehar et al.
SEM-ROVER enables scalable, geometrically coherent 3D driving scene generation via semantic voxel-guided diffusion—enabling realistic, large-scale simulation for autonomous driving systems without view limitations.
Wang Yang, Chaoda Song, Xinpeng Li et al.
ACE-Bench reduces agent evaluation overhead by 41% with controllable, scalable tasks—enabling reliable, repeatable benchmarking of LLM agents for real-world deployment.
Renxuan Tan, Rongpeng Li, Zhifeng Zhao et al.
Introduces Pareto-lenient consensus to avoid premature convergence in multi-preference LLM alignment—enables robust, nuanced value alignment without sacrificing performance on conflicting human preferences.
Zirui Li, Xinghao Chen, Lingyu Jiang et al.
PVIR introduces the first physics-aware benchmark for video object removal, forcing models to preserve physical consistency like shadows and reflections—critical for realistic video editing in production systems.