

other is the impact it can have. The
impact it can have is that we will one
day be able to train from tens of
millions - no, billions - of training
examples that cover tens of thousands
of object detectors. This is in fact
necessary to reach human-level ability,
you need lots of samples and lots of
classes. And one day we will be able to
do that at a cost that is within the
ability… maybe not of everyone, but at
least within the ability of a millionaire
[he laughs], in the million dollar range.
Now this would be absolutely
impossible, if you want a complete
annotation of every pixel in an image.
You cannot do a million objects,
basically. The problem will only be
solvable once we have all the
annotated data, and we will not
annotate it by hand the way we are
doing it in the fully supervised world.
That’s why reducing the annotation
time is not just a sport, it’s an enabler
of solving computer vision. Now if you
go to the scientific reason, which I am
even more passionate about, it is a
very interesting information-theory
type of trap. When you have a weakly
supervised learning problem, let’s say,
this image where we stand now: this is
a couch, this is Ralph, this is Vitto, this
is a plant in Hawaii. You have these
labels,
and there is actually
combinatorially many assignments of
the pixels in the image to these labels.
And all of them are consistent with the
labelling of the image, but some of
them make more sense in terms of
regularity. It’s very interesting that
theoretically, there are many solutions
that are valid, so that strictly and
information-theoretically speaking, it is
impossible to reconstruct pixel-level
labelling of an image from image-level
labels. And yet, there exist some
assignments that are more likely to
make sense in the visual world. For
instance, all the pixels on your face
probably all take the same label,
they’re all face. For me it is very
exciting that although we know that
there is no perfect closed-form
solution that will work, there is certain
families that make more sense in the
visual world and that lead to good
results at test time. So somehow I like
the fact that you start by saying that
the problem is impossible, and yet you
try to solve it.
You sound still as passionate as when
you started to study…
Oh, I am more passionate now! When I
started my PhD, I felt like a kid in a
candy store. You jump at everything
that looks cool, and you grab
something, lick it a bit, then you take
something else… so there is no
continuity of mission. Now I am
equally motivated, but because I
focused the energy of my team over
multiple years on a family of problems,
I also see a lot more progress. And I
appreciate the fine details of these
families of problems. So in fact I
actually feel more passionate now
compared to when I started.
Do you have tips on how to keep the
passion over a long period of time?
TuesdayVittorio Ferrari
5
“Oh, I am more passionate now! ”
“
Like a kid in a
candy store…
”