Topic
Vision-Language Models
Vision-language models that connect text and perception.
3 papers · latest 2026-04-23
Most active fields for this topic
Yupeng Zheng, Xiang Li, Songen Gu et al.
Presents a lightweight VLA model with world knowledge integration for efficient robot manipulation, enhancing spatial reasoning and task execution in compact robotic systems.
Kaiqi Hu, Linda Xiao, Shiyue Xu et al.
Introduces the first rigorous benchmark proving whether VLMs truly understand candlestick patterns—not just correlate them—essential for financial AI builders relying on visual market signal interpretation.
Jiajun Zhai, Hao Shi, Shangwei Guo et al.
E-VLA uses event cameras—normally used in robotics—to let robots see and act in near-total darkness or blur, where normal cameras fail. This enables real-world robotic systems to operate reliably in challenging environments like smoke-filled rooms or fast-moving scenes.