Computer Vision News - August 2017

Previous Page

Next Page

Page Background

Anna Rohrbach

43

“Shoot for the moon; even

if you miss, you'll land

among the stars!”

her co-authors was to make the model

learn this attention about the tracks

with they have extracted in the video,

and simultaneously try to describe the

sentence correctly while predicting the

additional modalities they need for

their model.

To realise this, they used an

encoder-

decoder LSTM

based approach. The

core of their work is the attention

mechanism which reasons about the

tracks of the current clip and the

previous clip, and which then jointly

addresses the grounding and the co-

reference challenges. This then informs

the LSTM to predict the right person-

specific labels.

Anna’s most ambitious is that one day

she would like to tackle the entire

movie, and not only look into one clip

back. She says: “

I would like to describe

the entire movie consistently and

coherently, and we are doing baby

steps in this direction

”. Talking about

her supervisor

Bernt Schiele

(who is

also involved in this work), she told

that he always says: “

Shoot for the

moon; even if you miss, you'll land

among the stars!

”. In this spirit, Anna is

aiming at something very ambitious,

and although they might not get there

immediately she believes that they will

get to “something cool”.

“

I am motivated by the idea of helping

the visually impaired and blind people

”,

Anna says, “

so I hope that one day we

will be able to automatically describe

movies and other visual sources to

assist them

”.

BEST OF CVPR