Computer Vision News - Summer 2025

9 Computer Vision News Computer Vision News Yiqing Liang that it’s a more abstract version of the information compared to the raw camera. For example, RGB-D point clouds. It's possible to abstract our output like that.” Looking at the bigger picture, Yiqing is curious about how this research could intersect with multimodal large language models. “For LLMs, the multimodal side still has a lot to explore,” she points out. “People are interested in how to encode visual information more efficiently, and how to let it interact more with textual information.” More than anything, what excites Yiqing most is this model’s generalizability. “It’s really cool how general it is!” she says with a smile. “We’ve tested it on out-of-domain datasets – real-world, high-motion scenes – and it still works!” NOTE: this interview was taken before the announcement of the award winners. Which explains why it mentions it being a Best Paper nomination and award candidate.

RkJQdWJsaXNoZXIy NTc3NzU=