Computer Vision News - December 2025

Computer Vision News Computer Vision News 38 Congrats, Doctor Ivona! Ivona Najdenkoska recently completed her PhD at the University of Amsterdam, under the supervision of Marcel Worring and Yuki Asano. She worked on multimodal foundation models and generative AI, investigating how these systems combine information from different modalities. In particular, her work studies how richer context - visual, textual, or both - can strengthen multimodal understanding, generation, and alignment. Ivona will continue her research at UvA as a post-doc. Congratulations, Doctor Ivona! The idea that motivated much of her thesis is simple: humans rarely rely on a single cue when understanding the world. We look at what surrounds an object, at earlier examples, at prior demonstrations of a task, at object relationships, and how images and text complement each other. Multimodal foundation models should ideally learn to use context in the same way. Her thesis begins with the challenge of learning from only a few image– caption examples as context. Language models often rely on hand-engineered task instructions that guide the model toward the correct task. The first chapter introduces a meta-learning approach that makes this task instruction learnable. It leverages frozen vision and language backbones connected through a lightweight module named the meta-mapper. This allows quick model adaptation from limited demonstrations, showing that even frozen models can be far more flexible than expected. Similar type of context appears in her work on diffusion models for image generation. These models are typically guided by crafted text prompts, yet many visual concepts— styles, color palettes, object arrangements—are hard to describe in words. Ivona introduced Context Diffusion - a framework that lets diffusion models learn from examples provided as context. Instead of using only text prompts, users can show the model a few images or combine them with text.

RkJQdWJsaXNoZXIy NTc3NzU=