Computer Vision News - October‏ 2023

13 Hilde Kuehne Computer Vision News What is your work about, Hilde? I'm working on everything in multimodal learning at the moment. Technically, it means trying to figure out how we can learn from different modalities and across different modalities. So we started with video, where it's obvious that you have more than one modality. Video is not only the vision part; it also has audio. In most cases nowadays, it comes with ASR. So there's also text! Actually, the interesting thing is that we figured out that one modality can actually be used to enhance learning for the other. I think that's a super cool thing because, similar to vision language models, it kind of frees us a bit from having to use annotation. It also opens up the space for anything free text. I think that's super cool! Tell us why it is super cool. Oh, that's a hard one. [laughs] I am here for the hard ones! Okay, so I especially come from video understanding and action recognition. One problem that we have with actions, probably more than with objects or anything else in the world, is that they are very hard to describe. People usually have a very good understanding of what an object is like. A mug is a mug, period. But actually, understanding actions highly depends on your world knowledge, on your expert knowledge for a specific task, and so on. Therefore, describing actions by pure categories usually works for a certain subset of tasks. This is what we have in current data sets, but it's usually not enough to capture all actions that are going on in the world. Therefore, moving away from pure classification, especially in the context of action and video understanding, is very important. First, having foundation models that actually transfer much better than what we have at the moment, and second, actually to get closer or to do even more for real-world applications. I understand now why it is cool. Is it cool enough to dedicate the best years of your career to research? [laughs] Absolutely! So what is best, teaching or researching? [hesitates a moment…] Both have good sides and bad sides. If I had to choose at the moment, I would probably say research. However, teaching and research are not separate for me. I mean, obviously, there are lectures. But technically, teaching and research happen together. When we have good Master's students or even PhD students, and they do research, technically, we also teach them on the fly how to be good researchers. This is something that I really love. So, actually, it's both. Isn't it funny that most of the research is done by people who are not yet proficient in research? They are just learning to research.