“My work at WACV this year introduces a training-free pipeline for classagnostic counting. We leverage fully unsupervised vision backbones to extract features and perform zero-shot object counting. It's a simple but highly effective approach that eliminates the huge costs of data collection and parameter tuning for unseen object classes!” “I hope everyone attending WACV here in sunny Tucson has a fantastic time! Don't forget to check out the amazing sessions today, and feel free to stop by my poster this morning (#77) if you need help counting saguaros!” 13:45-14:45 MageBench: Bridging Large Multimodal Models to Agents 13:45-14:45 You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal LLMs with Answer Extraction see full review on page 4 15:00-16:00 UniCoRN: Latent Diffusion-based Unified Controllable Image ... Poster Session 1-37: M4U: Evaluating Multilingual Understanding and Reasoning ... Poster Session 1-64: Prompt-OT: An Optimal Transport Regularization Paradigm ... Poster Session 1-69: PaRaChute: Pathology-Radiology Cross-Modal Fusion for ... Giacomo’s picks of the day: Giacomo Pacini is a PhD student at the University of Pisa and an associate researcher in Multimodal AI at CNR-ISTI, Italy. For today, Sunday 8 2 Giacomo’s Picks DAILY WACV Sunday Posters “My research centers on bridging the gap between image and text modalities. My recent work focuses on image captioning, multimodal information retrieval, and finding new applications for unsupervised vision backbones. Currently, I am studying how to better exploit the semantic representations of vision encoders in Large Multimodal Models. When I am not doing research, I love developing apps, tinkering with home automation, and learning new things.” Orals
RkJQdWJsaXNoZXIy NTc3NzU=