Namdar Homayounfar
is a PhD student at the
department of Statistical Sciences at the
University
of Toronto
and is also currently interning at
UBER
.
The goal of their work, Namdar told us,
is to localise a sports field in an image.
Doing this is the first step in data
generation in sports, because based on
knowing where the field is, the players
can be localised and more statistics
about them can be extracted - like how
much they run, which positions they
occupy or to identify offsides.
The novelty of this work is the way
they formulated and solved the
problem: they are the first to use
single-image (monocular) inputs, and
their approach is fully automatic, very
fast and exact.
Originally, Namdar was trying to solve
a different problem - to automatically
generate captions and statistics about
the players. But soon he realised that
in order to do so, they first need to
localise the field. He tried using
existing methods, but after a few
months had to conclude that they did
not perform well enough. “
This
problem could be solved if we had four
points from the image and four points
from the model, and we tried to
estimate the homography matrix
directly
”, he explains. But figuring out which
four points in the image correspond to
which four points in the model turns
out to be a very difficult problem.
about it like where the grass is, where
the lines of the field are, or where the
outside of the field is. It is very hard to
do this using handmade heuristics, due
to the variations between different
fields. Therefore, they came up with a
new machine learning method to solve
all of these problems of field
localisation.
They use a neural network model that
is able to tell them per pixel what
exactly it contains, and thus solves the
problem of both field localisation and
answering more specific questions.
This predictions are done per image,
and Namdar tells us that in the future,
they want to do this in a temporal
manner, for videos instead of single
images, incorporating temporal priors
so that there is a smooth transition of
homographies between the frames.
Namdar’s supervisors and co-authors,
Raquel Urtasun
and
Sanja Fidler
(see
pages 10-14 of this magazine) followed
attentively our discussion; when asked
to mention the main attractiveness of
this work, Raquel told us that “
the key
of this work is to come up with a
parameterisation of the problem that
allows you to do efficient inference by
taking into account the structure of the
problem and the advantages of
convolutional neural networks and
deep learning
”.
Namdar Homayounfar
46
MondaySports Field Localization via Deep Structured Models
Besides just
localising the
sports field,
they wanted
to also have
additional
information
BEST OF CVPR




