

What particular example did you use in
this work?
The main objective was comparing two
types of birds, where we give the
annotator six seconds to explore the
objects and find the differences. Then
we show one instance of the two
previous classes, and give the annotator
five seconds to make a decision to
which class they think this image
belongs. The five seconds was kind of
short, and usually in gaze studies they
use longer fixations. In our case it was a
short time because the annotator had
to give a fast reply, so we had to
process the gaze data in a different way.
We had to take out the outliers and
have a shorter time for the fixations.
How did you solve the problem?
After we collected the gaze data, we
processed the raw data and obtained
the gaze points out of that. Then we
wanted to come up with a class
representation, which is needed in
zero-shot learning to aid the
classification task. To do this, we used
the gaze data for the individual images
of the class to get an image
representation, and then averaged all
these images to one class
representation. So we had three types
of representations, the details are all in
the paper. We extracted many features
from the gaze points - the location on
the images, the duration, the pupil
diameter of the annotator, and the
sequence information between the
points, which is the angle between
subsequent
points.
Using this
information we noticed that the
location, duration and sequence was
more helpful than using the pupil
diameter. Studies say that pupil
diameter helps indicating the
concentration level of the annotator,
but apparently as the annotator
became familiar with the categories,
their concentration dropped. So it
wasn't very helpful to use this
information, and we had better
performance using only the other
features.
What are the next steps for your work?
In future work we want to explore how
to combine the gaze information from
different images to represent one class
in a better way. We would also like to
do more experiments on more datasets.
Our work compared species level of
birds and pets, but we could explore
more or larger fine grained datasets, for
example asking how we can compare
on the subspecies level.
You seem very passionate about this
subject. What do you particularly like?
I always had this interest in computer
vision and how vision works in humans,
and how we understand that one object
is different than another, just by one
glimpse. So it was interesting for me to
study the human behavior. Zero-shot
learning is particularly interesting
because as humans this is easy. For
Tuesday19
Nour Karessli