CVPR Daily - Thursday

You are organizing three workshops at this special CVPR, which will be virtual: we are not meeting in person. The awkward thing about this is that we are talking before your workshops, but the interview will be published afterwards. [ both laugh ] It’s a special situation that never happened before in any previous CVPR Daily. How do you expect the workshops to go? I’m very optimistic. I would say they went amazingly. They were fantastic. I think the Women in Computer Vision workshop is usually quite well attended. It’s a really important workshop. This year we’re having this virtual mentoring system. We’re matching junior women in the field with senior women, virtually through chat rooms. I hope that will go really well. The other workshop that I’m organizing is a challenge actually. I think it will be really great. Given the unusual circumstances, we expected that there would not be that much participation, but we had over 7 teams submit so yeah, I think it’s going pretty well. I am also co-organizing the Sight and Sound workshop. It is actually quite relevant to the multimodal work I am interested in. It’s kind of relevant to the poster that I will be presenting on Thursday - Speech2Action: Cross-Modal Supervision for Action Recognition. What do you want to achieve with these workshops and this challenge? With the challenge, I think one of the great things about it is that the goal of the challenge is video understanding from pre-extracted features. What this does is that it allows researchers to innovate on different feature-fusion or temporal aggregation methods, without huge computational complexity and resources that otherwise you would need to process video. One of the great things about this challenge is that it kind of helps in democratization. It means that people who aren’t in big companies or very well-funded labs can also participate. So I think that’s really great. Sorry for any background noise. I’m actually in Mumbai, and there’s a cyclone today. DAILY T h u r s d a y Women in Computer Vision 26