ICCV Daily 2019 - Thursday

Antonino tells us the challenging part is that this is egocentric data which means that it is acquired from the point of view of the person and there is lots of variability. One person might do something and then in one second it’s in another part of the room doing something else. Current algorithms do not work on this kind of data and anticipating what is going to happen is very difficult by itself. How did they solve this? Antonino explains: “There are three main points. One point is you need to separate these two tasks. We call it encoding Antonino Furnari is a postdoctoral researcher working with Giovanni Maria Farinella on First Person Vision at the Image Processing Laboratory at the University of Catania . They speak to us ahead of their oral and poster today. The focus of their work is egocentric action anticipation . The main idea is: you have a video which has been acquired from the point of view of a person wearing a camera and you want to observe the video and predict what the person is going to do next. To achieve this task, you usually have to create a summary of what you see to remember what actions have been performed or what objects are in the scene. You then use this information to make a prediction about what will happen next. This work models these two things separately with different models which are long short-term memory. The relation between the items is done implicitly by the LSTM . 12 Oral Presentation DA I L Y What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention From left: Antonino and Giovanni Maria