8 DAILY WACV Saturday Oral Presentation This work is about fine tuning diffusion models to perform geometry tasks such as depth and surface normals estimation. Depth estimation and surface normal estimation are very important for robotic navigation, 3D reconstruction, image and video editing and a bunch of other important computer tasks. “The novelty of this work,” Gonzalo reveals “is about repurposing stable diffusion as an efficient deterministic geometry estimation model.” But was the biggest challenge in doing this? One of the challenges was working with large models. Stable diffusion is a big model in general and also inference with diffusion models has some computational overhead as well. You have to do multi-step inference and so evaluating these models and doing certain ablations or tests takes also a lot of time. “I would say if you evaluate a model for 50 steps and evaluating on a single data set may take a lot of time,” Gonzalo shares. “You cannot really quickly iterate over your work. And I think that's what's nice about our work, where now you can use diffusion models in an end-to-end manner and generate predictions in a single neural pass very quickly!” In general, we would like to use very powerful models to estimate 3D scenes and get kind of new knowledge and new predictions without, for example, the use of sensors and the like. “In 3D, I'm particularly interested in 3D reconstruction of scenes,” Gonzalo Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Gonzalo Martin Garcia is a Master's student and a research assistant at the RWTH Aachen Computer Vision Group, under the supervision of Karim Abou Zeid and Christian Schmidt. Gonzalo is also the first author of an exceptional paper that was accepted as a poster and oral at WACW 2025.
RkJQdWJsaXNoZXIy NTc3NzU=