CVPR Daily - Friday

12 DAILY CVPR Friday Congrats, Doctor Gemma! Addressing these limitations, Gemma’s work explores novel methods to constrain and guide the image generation process by leveraging multimodal inputs, such as sketches, style, text, and exemplars, to guide the creative process. Based on the success of DALL-E, her first approach was CoGS (ECCV 2022), a framework for style-conditioned, sketch-driven image synthesis. By decoupling structure and appearance, CoGS empowers users to define coarse layouts via sketches and class labels and guide aesthetics using exemplar style images. A transformer-based encoder converts these inputs into a discrete codebook representation, which can be mapped into a metric space for finegrained adjustments. This unification of search and synthesis allows iterative refinement, enabling users to explore diverse appearance possibilities and produce results that closely match their vision. Building on this idea, PARASOL (CVPR WiCV 2024) advances control by enabling disentangled, parametric control of the visual style. This multimodal synthesis model conditions a latent diffusion framework on both content and finegrained style embeddings, ensuring Gemma’s bio is on page 8. Here is a review of her thesis, two weeks after her successful defense. Recent advancements in deep learning have transformed the field of image generation, enabling the creation of highly realistic and visually compelling images. However, despite their impressive capabilities, state-ofthe-art models often lack the fine-grained control needed to tailor outputs precisely. This challenge is particularly evident when user input is ambiguous or when multiple constraints must be satisfied simultaneously.

RkJQdWJsaXNoZXIy NTc3NzU=