Topic

Diffusion Models

Diffusion-based generation for images, video, and multimodal outputs.

7 papers · latest 2026-04-23

Most active fields for this topic

Computer Vision · 5 Multimodal · 1 Reasoning & Agents · 1

GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

Yuxuan Xue, Ruofan Liang, Egor Zakharov et al.

significant🔴 AdvancedComputer Vision Diffusion Models 3D Vision

cs.CVcs.CV

Presents GeoRelight, a unified framework for joint geometrical relighting and 3D reconstruction using diffusion transformers, improving physical consistency and reducing error accumulation in single-image relighting.

Details → arXiv →

Wan-Image: Pushing the Boundaries of Generative Visual Intelligence

Chaojie Mao, Chen-Wei Xie, Chongyang Zhong et al.

breakthrough🔴 AdvancedComputer Vision Diffusion Models

cs.CVcs.CV

Wan-Image transforms image generation from aesthetic synthesis to professional-grade control, enabling precise typography, identity preservation, and workflow integration—essential for designers and product builders needing pixel-perfect outputs.

Details → arXiv →

EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation

Ruibing Hou, Mingyue Zhou, Yuwei Gui et al.

breakthrough🔴 AdvancedReasoning & Agents LLM Reasoning Diffusion Models

cs.CVcs.CV

EgoMotion introduces the first diffusion-based framework for egocentric vision-language motion generation, enabling realistic 3D human motion synthesis from first-person views—critical for immersive VR, robotics, and human-robot interaction systems.

Details → arXiv →

MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation

Tanjim Rahaman Fardin, S M Zunaid Alam, Mahadi Hasan Fahim et al.

breakthrough🔴 AdvancedComputer Vision Diffusion Models

cs.CVcs.CV

MetaCloak-JPEG delivers JPEG-robust adversarial perturbations that block unauthorized DreamBooth deepfakes even after compression—essential for real-world privacy protection where images are routinely shared in degraded formats.

Details → arXiv →

Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data

Farbod Alinezhad, Jianfei Cao, Gary J. Young et al.

breakthrough🔴 AdvancedMultimodal Diffusion Models

cs.LG

CDM is the first diffusion model for counterfactual longitudinal outcomes, enabling accurate, uncertainty-quantified treatment effect predictions—vital for clinical decision systems and causal AI in healthcare.

Details → arXiv →

SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

Hiba Dahmani, Nathan Piasco, Moussab Bennehar et al.

breakthrough🔴 AdvancedComputer Vision Diffusion Models

cs.CVcs.CV

SEM-ROVER enables scalable, geometrically coherent 3D driving scene generation via semantic voxel-guided diffusion—enabling realistic, large-scale simulation for autonomous driving systems without view limitations.

Details → arXiv →

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Hyunsoo Cha, Wonjung Woo, Byungjun Kim et al.

significant🔴 AdvancedComputer Vision Diffusion Models

cs.CVcs.CV

Vanast eliminates the need for separate try-on and animation steps by doing both in one go, reducing distortions and identity drift. This means you can generate realistic, coherent videos of people wearing new clothes from just one image—useful for e-commerce and virtual fashion without complex pipelines.

Details → arXiv →