← Back to fields

Field

Computer Vision

Image, video, and 3D perception plus visual generation.

7 papers · latest 2026-04-10

Common topics in this field

Ziwei Zhou, Zeyuan Lai, Rui Wang et al.

significant🟡 IntermediateComputer VisionVideo Generation
cs.CVcs.AIcs.CL

AVGen-Bench finds that today's flashy text-to-audio-video systems are still semantically unreliable, especially for speech, text rendering, physical reasoning, and musical pitch control.

InSpatio Team, Donghui Shen, Guofeng Zhang et al.

significant🔴 AdvancedComputer VisionVideo Generation
cs.CVcs.CV

A real-time 4D world simulator from a single video that emphasizes spatial consistency and controllable interaction, pointing toward more usable interactive environments for embodied training and evaluation.

Hiba Dahmani, Nathan Piasco, Moussab Bennehar et al.

breakthrough🔴 AdvancedComputer VisionDiffusion Models
cs.CVcs.CV

SEM-ROVER enables scalable, geometrically coherent 3D driving scene generation via semantic voxel-guided diffusion—enabling realistic, large-scale simulation for autonomous driving systems without view limitations.

Zirui Li, Xinghao Chen, Lingyu Jiang et al.

breakthrough🔴 AdvancedComputer VisionVideo Generation
cs.CVcs.CV

PVIR introduces the first physics-aware benchmark for video object removal, forcing models to preserve physical consistency like shadows and reflections—critical for realistic video editing in production systems.

Haoxuan Han, Weijie Wang, Zeyu Zhang et al.

breakthrough🟡 IntermediateComputer Vision3D Vision
cs.CVcs.CV

DDP shows that deliberately blurring images can make AI answer visual questions more accurately by forcing it to focus on core structures instead of distracting details. This flips conventional wisdom—less data can mean better performance, and it’s easy to plug into existing VQA systems.

Hyunsoo Cha, Wonjung Woo, Byungjun Kim et al.

significant🔴 AdvancedComputer VisionDiffusion Models
cs.CVcs.CV

Vanast eliminates the need for separate try-on and animation steps by doing both in one go, reducing distortions and identity drift. This means you can generate realistic, coherent videos of people wearing new clothes from just one image—useful for e-commerce and virtual fashion without complex pipelines.

Ahan Shabanov, Peter Hedman, Ethan Weber et al.

significant🔴 AdvancedComputer Vision3D Vision
cs.CVcs.CV

This paper changes how 3D scenes are built by removing the need for a rigid grid structure, allowing for more efficient and detailed models from just a few photos. It solves the problem of missing data in unobserved areas by generating plausible details rather than leaving gaps. Practitioners can use this to create lighter, faster 3D assets for games or VR without needing extensive camera rigs.

© 2026 A2A.pub — AI to Action. From papers to practice, daily.
Summaries are AI-assistedPrivacyTerms