CVPR Daily - Tuesday

We speak to Andrea Zunino and Sarah Adel Bargal ahead of their poster presentation today. Andrea is a postdoc researcher at the Italian Institute of Technology in Genoa in the Pattern Analysis and Computer Vision department. His supervisor is Vittorio Murino. Sarah Adel Bargal is a PhD student at the University of Boston in the Image and Video Computing Group. Her supervisor is Stan Sclaroff.… Andrea and Sarah’s work presents a top-down saliency framework for vision understanding. Portraying a CNN-LSTM architecture trained for action recognition in videos or video captioning, their work is able to highlight in the video frames which are the spatial temporal evidence that the model has used to classify the action or to generate the specific word in the caption. The aim is to give an explanation about why these deep learning techniques are the best methods to use for video understanding and to help understand why they are working. CNN approaches are already working well for image understanding. What they have done is try to extend these explanations for CNN but also using LSTM methods, considering video as input. “Actually, the trickiest part was to extend this method for LSTM architecture. We solved it by normalising in time these backpropagated probabilities. ” 10 Sarah & Andrea Tuesday