← Back to topics

Topic

Alignment & Safety

Alignment, preference learning, robustness, and safe deployment.

8 papers · latest 2026-04-13

Most active fields for this topic

Nastaran Darabi, Amit Ranjan Trivedi

cs.ROcs.CLcs.CV

ProGAL-VLA adds verified grounding and prospective sub-goals to VLA robots, sharply improving instruction sensitivity, ambiguity handling, and robustness under perturbation.

Qiyao Ma, Dechen Gao, Rui Cai et al.

breakthrough🟡 IntermediateNLPAlignment & Safety
cs.CLcs.LGcs.CL

A benchmark for personalized reward modeling that tracks downstream BoN and PPO performance, showing today's reward models still struggle to capture user-specific preferences that matter for aligned products.

Mohamed Darwish Mounis, Mohamed Mahmoud, Shaimaa Sedek et al.

significant🟡 IntermediateNLPRAGAlignment & Safety
cs.IRcs.CVcs.IR

Shows multimodal retrieval is often a query-alignment problem, not an encoder problem, and beats strong baselines by rewriting image-text queries into retrieval-optimized text.

Renxuan Tan, Rongpeng Li, Zhifeng Zhao et al.

breakthrough🔴 AdvancedNLPAlignment & SafetyLLM Reasoning
cs.AIcs.AI

Introduces Pareto-lenient consensus to avoid premature convergence in multi-preference LLM alignment—enables robust, nuanced value alignment without sacrificing performance on conflicting human preferences.

Bowen Ye, Rang Li, Qibin Yang et al.

cs.AIcs.AI

Claw-Eval introduces transparent, safety-aware, multimodal evaluation for autonomous agents, addressing critical gaps in benchmarking—essential for building trustworthy, real-world AI agents.

Xiaojie Gu, Ziying Huang, Weicong Hong et al.

breakthrough🔴 AdvancedNLPLLM ReasoningAlignment & Safety
cs.CLcs.AIcs.LG

Exposes how LLMs mimic edits without true memory updates, revealing dangerous surface compliance—vital for builders deploying knowledge-editing tools where factual reliability is non-negotiable.

Grace Liu, Brian Christian, Tsvetomira Dumbalska et al.

breakthrough🟡 IntermediateReasoning & AgentsAlignment & Safety
cs.AIcs.AI

AI assistants that always answer quickly make users dependent and worse at thinking alone. This is the first solid evidence that good AI should sometimes say 'figure it out'—a wake-up call for designers building educational or productivity tools.

Alexis Burgon, Berkman Sahiner, Nicholas A Petrick et al.

significant🟡 IntermediateReasoning & AgentsAlignment & Safety
cs.AIcs.PFcs.AI

This work introduces a standardized framework to evaluate AI medical devices that learn and adapt over time, solving a major regulatory bottleneck. It provides clear metrics to distinguish between a model actually improving versus just memorizing new data, which is critical for getting adaptive AI approved for clinical use.

© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms