CVPR Daily - Wednesday
human annotators, you have less control over what the content of each question is. This work uses a synthetic generation method to build a set of questions designed to measure different aspects of visual reasoning. The team made a symbolic representation of a video through what is called a spatio-temporal scene graph, which annotates the objects, actions and relationships that occur within a subset of frames. With that annotation, which came from two previous datasets, Charades and Action Genome , you can automatically generate questions using templates. These templates reason about the symbolic video representation to find the answers to questions. “ The most challenging part was realizing that even though we’re saying people break down complex visual information into these compositional parts , humans’ actual understanding of those different parts is still very complex , and people have different definitions of what each object means, ” Madeleine explains. “ Understanding how to standardize the annotations that people have done over the video to make them into useful questions was difficult. We also included metrics that were specifically made to measure compositional reasoning. This hadn’t been done before for video question-answering or beyond small toy datasets over images. We spent some time thinking, how do we manipulate the training and test set in order to specifically measure compositional reasoning, instead of just visual recognition? That was a challenge. ” Madeleine tells us her favourite part of the project was thinking deeply about all the different questions that you can ask about a video, and then breaking those down into individual reasoning steps before putting them together to get something bigger. “ There’s something that I personally find really interesting about modularity, ” she says. “ I know there’s a debate about whether or not that’s the right approach, but I find it really fascinating that you can take something and extract out the most necessary parts and reason over those individual things. ” 11 DAILY CVPR Wednesday Madeleine Grunde-McLaughlin
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=