← Back to topics

Topic

Multimodal Understanding

Cross-modal understanding across text, image, video, and audio.

1 papers · latest 2026-04-10

Most active fields for this topic

Shilin Yan, Jintao Tong, Hongwei Xue et al.

cs.CVcs.AIcs.CV

Act Wisely separates task accuracy from tool-efficiency rewards so multimodal agents learn when not to call tools, cutting unnecessary invocations by orders of magnitude while improving accuracy, latency, and cost.

© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms