Daily CVPR - Wednesday

CVPR Daily: Khurram, what is the work that you are presenting? Khurram: I’m presenting a work about “Predicting the Where and What of Actors and Actions through Online Action Localization”. The basic task is to predict what the action is and where it is happening in a video. Let’s say we are streaming a video. We want to say what the action is and where it is happening by looking at each frame. CVPR Daily: Can you tell me what is the novelty in this work? Khurram: The novelty is that it’s the online way of doing it. We are the first ones who can predict the action and localize it as we are viewing the video. Traditionally, the way that people do the action localization is in an offline manner. I give you a video, then you localize and detect it. However, with our work, we are able to predict what the action is and detect it while you are viewing the video. CVPR Daily: What is the main algorithm that you used in this work? Khurram: The main algorithm is that we are using a combination of super pixels and poses together to learn a foreground likelihood model that distinguishes the foreground object from the background. We use super pixels within the pose bounding box to come up with a way to assign scores to each super pixel. Once we use the poses and super pixels together, we refine these poses. CVPR Daily: Can you tell me what was particularly challenging in this work? Khurram: The challenging part is that because we are going to predict the action, we have limited information as we are viewing. Let’s say we are only seeing 10 frames of a 100 frame video. You are only basing your prediction on those 10 frames. That’s why it becomes more challenging compared to offline manners which have the entire case. 12 Presentations CVPR Daily: Wednesday Khurram Soomro video and the entire motion visible. It becomes easier in that case. CVPR Daily: What are the practical applications that this work can generate?