Topic

AI Agents

Agentic systems, multi-agent coordination, and task planning.

58 papers · latest 2026-04-23

Most active fields for this topic

Reasoning & Agents · 56 Machine Learning · 1 NLP · 1

Interval POMDP Shielding for Imperfect-Perception Agents

William Scarbro, Ravi Mangal

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Provides safety shielding for autonomous agents with imperfect perception, using confidence intervals to block potentially unsafe actions.

Details → arXiv →

ActuBench: A Multi-Agent LLM Pipeline for Generation and Evaluation of Actuarial Reasoning Tasks

Jan-Philipp Schmidt

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.CLcs.AI

Presents ActuBench, a multi-agent LLM pipeline for generating and evaluating actuarial reasoning tasks, enabling automated, curriculum-aligned assessment item creation and validation.

Details → arXiv →

Stateless Decision Memory for Enterprise AI Agents

Vasundra Srinivasan

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Proposes stateless decision memory for regulated enterprise AI agents. Enables scalable, auditable, and compliant long-horizon decision-making in sensitive domains.

Details → arXiv →

EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation

Aimin Zhang, Jiajing Guo, Fuwei Jia et al.

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Presents EvoAgent, an evolvable LLM agent framework with structured skill learning and hierarchical delegation that enables continuous capability improvement through user feedback and multi-agent collaboration.

Details → arXiv →

Explicit Trait Inference for Multi-Agent Coordination

Suhaib Abdurahman, Etsuko Ishii, Katerina Margatina et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents Efficient Inference

cs.AIcs.MAcs.AI

ETI improves multi-agent coordination by modeling psychological traits of partners, reducing goal drift and errors. Builders should integrate it to create reliable, human-like agent teams for complex collaborative tasks.

Details → arXiv →

How Adversarial Environments Mislead Agentic AI?

Zhonghao Zhan, Huichi Zhou, Zhenhao Li et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Introduces the 'Trust Gap' in agentic AI, revealing that tools can be weaponized to mislead agents—demanding new evaluation standards that test skepticism, not just competence, for real-world deployment safety.

Details → arXiv →

AIT Academy: Cultivating the Complete Agent with a Confucian Three-Domain Curriculum

Jiaqi Li, Lvyang Zhang, Yang Zhao et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

AIT Academy proposes the first principled curriculum for holistic agent development, addressing systemic gaps in current agent training—vital for builders aiming for general-purpose AI agents.

Details → arXiv →

Human-Guided Harm Recovery for Computer Use Agents

Christy Li, Sky CH-Wang, Andi Peng et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.CLcs.AI

Human-Guided Harm Recovery introduces the first formal framework for correcting harmful agent actions post-execution, enabling safe, real-world deployment of AI agents with human-aligned recovery protocols.

Details → arXiv →

Don't Start What You Can't Finish: A Counterfactual Audit of Support-State Triage in LLM Agents

Eren Unlu

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Proposes SSTA-32, a diagnostic framework to evaluate if agents can diagnose task blockers before acting—critical for building trustworthy autonomous systems that avoid costly errors in open-ended environments.

Details → arXiv →

Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench

Bhaskar Gurram

breakthrough🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.CLcs.MA

Reveals critical flaws in automated LLM agent evaluation and provides a human-validated benchmark with runtime mitigation, essential for building reliable tool-using agents in production systems.

Details → arXiv →

SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems

Hikaru Shindo, Hanzhao Lin, Lukas Helff et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents Embodied Agents

cs.AIcs.LGcs.MA

SocialGrid provides the first benchmark for social reasoning in embodied multi-agent systems, exposing critical gaps in LLM agents' planning and deception detection—essential for building trustworthy autonomous agents.

Details → arXiv →

Autogenesis: A Self-Evolving Agent Protocol

Wentao Zhang, Zhe Zhao, Haibin Wen et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Autogenesis introduces a self-evolving agent protocol with lifecycle and versioning control, enabling scalable, maintainable multi-agent systems—essential for production AI ecosystems that require autonomous updates without brittleness.

Details → arXiv →

Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models

Vincenzo Yuto Civale, Roberto Semeraro, Andrew David Bagdanov et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Optimal representations in single-cell models are not in final layers but task-dependent intermediate ones—revolutionizing how to extract features for biological AI, directly improving prediction accuracy in research systems.

Details → arXiv →

Scaling Test-Time Compute for Agentic Coding

Joongwon Kim, Wannan Yang, Kelvin Niu et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents Efficient Inference

cs.SEcs.AIcs.CL

Scaling test-time compute for agentic coding introduces trajectory-based evaluation, enabling meaningful refinement of long-horizon code agents—key for autonomous dev tools.

Details → arXiv →

Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees

Dongxin Guo, Jikun Wu, Siu-Ming Yiu

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.GTcs.AIcs.GT

This work formally models LLM agent coalitions using hedonic game theory, providing the first stability and convergence guarantees—critical for deploying reliable, cooperative multi-agent systems in real-world environments.

Details → arXiv →

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

Zerun Ma, Guoqiang Wang, Xinchen Xie et al.

breakthrough🔴 AdvancedNLP LLM Reasoning AI Agents

cs.AIcs.CLcs.AI

TREX automates end-to-end LLM fine-tuning using multi-agent collaboration, eliminating manual hyperparameter tuning and workflow design—critical for teams scaling LLM deployment without expert ML engineers.

Details → arXiv →

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Jiahang Lin, Kai Hu, Binghai Wang et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents RAG

cs.CLcs.CL

Introduces a multi-turn RL agent for visual QA over long documents, enabling iterative retrieval and synthesis—transforming RAG from static lookup to dynamic reasoning for complex document systems.

Details → arXiv →

AIBuildAI: An AI Agent for Automatically Building AI Models

Ruiyi Zhang, Peijia Qin, Qi Cao et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Introduces an AI agent that autonomously builds AI models end-to-end, reducing expert dependency—game-changing for practitioners needing rapid, scalable model development without manual tuning.

Details → arXiv →

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.CRcs.AIcs.CR

SafeHarness is the first lifecycle-integrated security architecture for LLM agents, closing critical attack vectors in tool orchestration—essential for trustworthy, production-grade agent systems.

Details → arXiv →

ARGOS: Who, Where, and When in Agentic Multi-Camera Person Search

Myungchul Kim, Kwanyong Park, Junmo Kim et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.CVcs.AIcs.MA

ARGOS frames person search as an interactive agent task with questioning and reasoning—enabling real-world surveillance systems to operate under ambiguity with minimal human input.

Details → arXiv →

CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems

Yongxuan Wu, Xixun Lin, He Zhang et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

First demonstration that LLM agent communication topologies can be inferred via black-box queries—exposing critical privacy risks and demanding new architectural safeguards in multi-agent deployments.

Details → arXiv →

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents

Yijuan Liang, Xinghao Chen, Yifan Ge et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.AI

A unified 22k-tool, 390k-example tool-use stack that standardizes data and evaluation and lets an 8B model beat major commercial models on hard distractor-heavy calling.

Details → arXiv →

FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

Haoran Ding, Zhaoguo Wang, Haibo Chen

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.SEcs.AIcs.SE

This brings Hoare-style reasoning to 143k-line systems by inferring specs from caller intent, surfacing 522 new bugs in already-tested codebases.

Details → arXiv →

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Xiaomeng Hu, Yinger Zhang, Fei Huang et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents World Models

cs.CLcs.CL

OccuBench is a 100-scenario benchmark for professional agents across 65 domains that also injects hidden environment faults, exposing how brittle frontier models still are in real work settings.

Details → arXiv →

From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

Jinhua Wang, Biswa Sengupta

breakthrough🟡 IntermediateReasoning & Agents AI Agents

cs.SEcs.AIcs.SE

This benchmark-driven translation of a production AI coding agent from Rust to Python shows how LLMs can migrate large systems continuously while staying competitive on real agent benchmarks.

Details → arXiv →

CocoaBench: Evaluating Unified Digital Agents in the Wild

CocoaBench Team, Shibo Hao, Zhining Zhang et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.CLcs.AIcs.CL

CocoaBench is a strong reality check for unified digital agents, with long-horizon tasks that force systems to combine vision, search, and coding in one workflow.

Details → arXiv →

PaperScope: A Multi-Modal Multi-Document Benchmark for Agentic Deep Research Across Massive Scientific Papers

Lei Xiong, Huaying Yuan, Zheng Liu et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.AI

PaperScope evaluates agentic deep research across multiple scientific papers, tables, and figures, exposing how hard real multi-document synthesis still is.

Details → arXiv →

Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents

S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis et al.

significant🟡 IntermediateMachine Learning Efficient Inference AI Agents

cs.AIcs.AI

By reusing one small model as summarizer, agent, and isolated code reviewer, this inference-time scaffold roughly doubles AppWorld performance on a single 24GB GPU.

Details → arXiv →

SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering

Ningyan Zhu, Huacan Wang, Jie Zhou et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.AI

SemaClaw frames harness engineering as the real differentiator for personal AI agents, focusing on the infrastructure layer that turns raw models into auditable systems.

Details → arXiv →

Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

Xiaozhe Li, Tianyi Lyu, Yizhao Yang et al.

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

A small RL-trained ContextCurator learns to trim noisy history while preserving reasoning anchors, boosting long-horizon agents and slashing token use up to 8x.

Details → arXiv →

Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents

Xing Zhang, Guanghui Wang, Yanwei Cui et al.

significant🟢 BeginnerReasoning & Agents AI Agents

cs.AIcs.CLcs.AI

A rare large-scale study of CLAUDE.md-style rules finds that negative constraints help coding agents while many positive instructions quietly hurt them.

Details → arXiv →

Hodoscope: Unsupervised Monitoring for AI Misbehaviors

Ziqian Zhong, Shashwat Saxena, Aditi Raghunathan

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

Hodoscope uses unsupervised behavior monitoring to surface novel agent exploits and cut review effort by 6x to 23x, making it a practical safety layer for red teams and benchmark maintainers.

Details → arXiv →

Pioneer Agent: Continual Improvement of Small Language Models in Production

Dhruv Atreja, Julia White, Nikhil Nayak et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.CLcs.LG

Pioneer Agent turns small-model adaptation into an automated closed loop that diagnoses failures, curates new data, retrains under regression constraints, and materially improves production-style tasks.

Details → arXiv →

MPAC: A Multi-Principal Agent Coordination Protocol for Interoperable Multi-Agent Collaboration

Kaiyang Qian, Xinmin Fang, Zhengxiong Li

significant🟡 IntermediateReasoning & Agents AI Agents

cs.MAcs.AIcs.MA

MPAC proposes a real coordination protocol for multi-owner agent systems, adding structured conflict handling and governance so agents can safely share state instead of silently clobbering each other.

Details → arXiv →

EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

Tiantian He, Yihang Chen, Keyue Jiang et al.

significant🔴 AdvancedReasoning & Agents Tool Use AI Agents

cs.AIcs.AI

EE-MCP shows how MCP-plus-GUI agents can self-improve by generating environments, synthesizing gap tasks, and accumulating reusable experience, with clear gains across desktop apps.

Details → arXiv →

Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning

Siyuan Xu, Shiyang Li, Xin Liu et al.

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

COVERT turns synthetic tool-use data into reward-checkable RL environments, making it much easier to harden agent tool calling against ambiguity, distractor tools, and noisy outputs.

Details → arXiv →

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

Yucheng Shen, Jiulong Wu, Jizhou Huang et al.

significant🔴 AdvancedReasoning & Agents RAG AI Agents

cs.CVcs.AIcs.CV

VISOR pushes visual RAG toward real agent behavior with iterative search, evidence-space tracking, and drift control for long-horizon multimodal question answering over documents.

Details → arXiv →

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

Mohamed Elfeki, Tu Trinh, Kelvin Luu et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.AI

HiL-Bench measures whether agents know when to ask for missing information, exposing a major reliability gap that standard pass/fail coding benchmarks mostly hide.

Details → arXiv →

CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

Yushi Feng, Junye Du, Qifan Wang et al.

significant🔴 AdvancedReasoning & Agents AI Agents

cs.LGcs.AIcs.LG

CORA adds conformal risk control to mobile GUI agents so teams can set explicit harm budgets and abstain before risky clicks instead of trusting heuristic guardrails.

Details → arXiv →

Many-Tier Instruction Hierarchy in LLM Agents

Jingyu Zhang, Tianjian Li, William Jurayj et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.CLcs.AIcs.CL

Many-Tier Instruction Hierarchy shows today's agents break down when instruction privilege gets more granular, making it a useful stress test for serious multi-tool and multi-role deployments.

Details → arXiv →

HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks

Suhana Bedi, Ryan Welch, Ethan Steinberg et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.AI

HealthAdminBench gives computer-use agents a rare end-to-end GUI benchmark in a real workflow domain and shows that strong subtask scores still collapse into poor task completion.

Details → arXiv →

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Tanmay Gupta, Piper Wolters, Zixian Ma et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents

cs.CVcs.CV

An open 4B and 8B visual web agent plus large mixed training set that beats comparable open agents and some larger closed systems, giving builders a reproducible browser-automation stack without HTML or accessibility-tree dependence.

Details → arXiv →

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Yuxuan Zhang, Yubo Wang, Yipeng Zhu et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents

cs.CLcs.AIcs.CL

A live-web benchmark across 144 production sites and everyday tasks, showing frontier agents still complete only a small slice of real user workflows and giving builders a far more realistic yardstick than sandboxed browser evals.

Details → arXiv →

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Shilin Yan, Jintao Tong, Hongwei Xue et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents Multimodal Understanding

cs.CVcs.AIcs.CV

Act Wisely separates task accuracy from tool-efficiency rewards so multimodal agents learn when not to call tools, cutting unnecessary invocations by orders of magnitude while improving accuracy, latency, and cost.

Details → arXiv →

ParseBench: A Document Parsing Benchmark for AI Agents

Boyang Zhang, Sebastián G. Acosta, Preston Carlson et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.CVcs.CV

ParseBench is a 2,000-page enterprise document benchmark that scores tables, charts, formatting, faithfulness, and grounding the way agents actually need them, exposing why text-similarity metrics miss business-critical parsing failures.

Details → arXiv →

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

Tongbo Chen, Zhengxi Lu, Zhan Xu et al.

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

KnowU-Bench evaluates personalized mobile agents in live GUI environments, including when to ask, act, or stay silent, which is much closer to real assistant behavior than static preference benchmarks.

Details → arXiv →

Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

Khushal Sethi

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.CLcs.MA

TrACE spends extra rollouts only on uncertain agent steps, matching fixed self-consistency accuracy with far fewer model calls and offering an easy path to cheaper agent inference.

Details → arXiv →

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Guo Gan, Yuxuan Ding, Cong Chen et al.

significant🔴 AdvancedReasoning & Agents AI Agents

cs.LGcs.AIcs.LG

Reframes online agent RL as single-state multi-action learning, boosting Android agent success while reducing expensive emulator waste—useful for training UI agents under tight latency and budget constraints.

Details → arXiv →

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization

Yu Li, Sizhe Tang, Tian Lan

significant🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.LGcs.AI

Builds a cognitive tree across multi-turn trajectories to assign credit at the step level, improving policy optimization for reasoning, planning, and interactive agents with long sparse-reward chains.

Details → arXiv →

How Much LLM Does a Self-Revising Agent Actually Need?

Seongwoo Jeong, Seonil Son

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.CLcs.AI

Shows explicit world models and symbolic reflection do most of the work in a self-revising agent, suggesting many agent stacks can trade extra model calls for better runtime structure.

Details → arXiv →

Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

Eranga Bandara, Ross Gore, Sachin Shetty et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.AI

Agentic AI automates end-to-end retail supply chains with real-world coordination—reduces manual labor at scale, proving LLM agents can drive high-stakes, operational workflows reliably.

Details → arXiv →

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Bowen Ye, Rang Li, Qibin Yang et al.

breakthrough🟡 IntermediateReasoning & Agents AI Agents Alignment & Safety

cs.AIcs.AI

Claw-Eval introduces transparent, safety-aware, multimodal evaluation for autonomous agents, addressing critical gaps in benchmarking—essential for building trustworthy, real-world AI agents.

Details → arXiv →

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

Maria Nesterova, Mikhail Kolosov, Anton Andreychuk et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents

cs.AIcs.AI

A single GPT-based model learns diverse MARL tasks, eliminating task-specific architectures—enabling scalable, generalizable multi-agent systems without retraining for each environment.

Details → arXiv →

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

Wang Yang, Chaoda Song, Xinpeng Li et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.AIcs.CLcs.AI

ACE-Bench reduces agent evaluation overhead by 41% with controllable, scalable tasks—enabling reliable, repeatable benchmarking of LLM agents for real-world deployment.

Details → arXiv →

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

Nirajan Acharya, Gaurav Kumar Gupta

breakthrough🔴 AdvancedReasoning & Agents AI Agents Tool Use

cs.CRcs.AIcs.CR

First formal security framework for MCP-based AI agents, defining threats and verifiable defenses. Essential for builders deploying LLM agents with external tool access in production environments.

Details → arXiv →

Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency

Guan-Ting Lin, Chen Chen, Zhehuai Chen et al.

significant🟡 IntermediateReasoning & Agents Tool Use AI Agents

cs.CL

Voice agents often fail when users stutter, pause, or interrupt, leading to broken API calls and frustrated users. This benchmark uses real human speech to reveal exactly how top models handle these messy realities. It allows developers to test if their voice systems can actually execute tasks reliably in natural conversation.

Details → arXiv →

Agentic Federated Learning: The Future of Distributed Training Orchestration

Rafael O. Jarczewski, Gabriel U. Talasso, Leandro Villas et al.

significant🔴 AdvancedReasoning & Agents AI Agents

cs.MAcs.AIcs.MA

Agentic Federated Learning uses AI agents to dynamically manage distributed training across unreliable devices. This matters because it makes privacy-preserving AI training faster and more reliable in real-world settings like mobile networks or hospitals with spotty connectivity.

Details → arXiv →

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Chenxi Wang, Zhuoyun Yu, Xin Xie et al.

significant🟡 IntermediateReasoning & Agents AI Agents

cs.CLcs.AIcs.IR

SkillX creates a shared knowledge base of skills that allows AI agents to learn from each other's experiences rather than starting from scratch. This prevents redundant exploration and speeds up the development of capable agents. Builders can reuse these skills across different projects, significantly cutting down training time and costs.

Details → arXiv →