Computer Vision News - March 2024

Computer Vision News 4 “We also propose a multistep unrolling strategy,” Yumeng explains. “Instead of just applying the loss at one single timestep, we unroll according to the iterative denoising process at the inference time. We mimic this process during training to encourage alignment across different timesteps.” Instability during training posed a significant challenge due to introducing a discriminator and varying noise levels at different timesteps. To address this, the team proposed sparse unrolling, applying the strategy intermittently to reduce training costs while maintaining stability. Looking ahead, Yumeng emphasizes the broader applicability of this approach beyond L2I generation. “Multistep unrolling is not specific to layout-to-image generation,” the researcher points out. “It can be generally applied to improve diffusion model training. As diffusion models apply an iterative ICLR Paper Presentation

RkJQdWJsaXNoZXIy NTc3NzU=