condition the sampling procedure to force the diffusion model to produce some elements from a specific class. One of the key innovations in their work is the normalization of classifier gradients. Controlling these gradients is crucial – if they are too weak or too strong, they can distort the generated image. Their solution balances this influence, optimizing the model’s output. Jacek highlights two significant challenges they faced: conducting extensive, time-consuming experiments to find the correct model and understanding and building the theoretical foundation. “We observed that if you train the diffusion model and want to force it to present elements of the given class, then often the resulting image was non-realistic and elements of the class were too big or too artificial,” he tells us. “This suggested that maybe something was wrong with the underlying theory. We started investigating this problem and observed that you can switch from the probabilistic to the geometric perspective.” ADM-G is based on a probabilistic approach, but the team found that adopting a geometric perspective led to significantly better results. By analyzing the diffusion model in terms of the distance of its trajectory from the data manifold, they developed a model that is easy to implement and outperforms the probabilistic approach. With diffusion models being one of the hottest topics in modern computer vision research, Jacek believes they were chosen for a coveted oral presentation this year because their work delves deeper into understanding what is happening behind the model. 5 DAILY WACV Saturday GeoGuide
RkJQdWJsaXNoZXIy NTc3NzU=