Topic

Multimodal Understanding

Cross-modal understanding across text, image, video, and audio.

4 papers · latest 2026-04-22

Most active fields for this topic

Multimodal · 2 NLP · 1 Reasoning & Agents · 1

Multimodal Transformer for Sample-Aware Prediction of Metal-Organic Framework Properties

Seunghee Han, Jaewoong Lee, Jihan Kim

breakthrough🔴 AdvancedMultimodal Multimodal Understanding

cs.AI

Multimodal Transformer models sample-level variability in MOFs, not just framework identity—enabling accurate property prediction for real experimental materials, transforming ML in materials science.

Details → arXiv →

Cross-Modal Bayesian Low-Rank Adaptation for Uncertainty-Aware Multimodal Learning

Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

breakthrough🔴 AdvancedMultimodal Multimodal Understanding Model Compression

cs.LGcs.AIcs.LG

CALIBER introduces Bayesian low-rank adaptation for uncertainty-aware multimodal learning, enabling robust, efficient fine-tuning in low-resource settings—essential for builders deploying reliable multimodal systems under data scarcity.

Details → arXiv →

MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents

Joongmin Shin, Chanjun Park, Jeongbae Park et al.

breakthrough🟡 IntermediateNLP RAG Multimodal Understanding

cs.AIcs.CLcs.AI

MultiDocFusion integrates vision and text to preserve structural context in long industrial documents, dramatically improving RAG accuracy—essential for enterprises relying on precise QA from complex PDFs, manuals, and reports.

Details → arXiv →

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Shilin Yan, Jintao Tong, Hongwei Xue et al.

breakthrough🔴 AdvancedReasoning & Agents AI Agents Multimodal Understanding

cs.CVcs.AIcs.CV

Act Wisely separates task accuracy from tool-efficiency rewards so multimodal agents learn when not to call tools, cutting unnecessary invocations by orders of magnitude while improving accuracy, latency, and cost.

Details → arXiv →