Monday
Anna Rohrbach
43
“Shoot for the moon; even
if you miss, you'll land
among the stars!”
her co-authors was to make the model
learn this attention about the tracks
with they have extracted in the video,
and simultaneously try to describe the
sentence correctly while predicting the
additional modalities they need for
their model.
To realise this, they used an
encoder-
decoder LSTM
based approach. The
core of their work is the attention
mechanism which reasons about the
tracks of the current clip and the
previous clip, and which then jointly
addresses the grounding and the co-
reference challenges. This then informs
the LSTM to predict the right person-
specific labels.
Anna’s most ambitious is that one day
she would like to tackle the entire
movie, and not only look into one clip
back. She says: “
I would like to describe
the entire movie consistently and
coherently, and we are doing baby
steps in this direction
”. Talking about
her supervisor
Bernt Schiele
(who is
also involved in this work), she told
that he always says: “
Shoot for the
moon; even if you miss, you'll land
among the stars!
”. In this spirit, Anna is
aiming at something very ambitious,
and although they might not get there
immediately she believes that they will
get to “something cool”.
“
I am motivated by the idea of helping
the visually impaired and blind people
”,
Anna says, “
so I hope that one day we
will be able to automatically describe
movies and other visual sources to
assist them
”.
BEST OF CVPR




