CVPR Daily - Tuesday

Previous Page

Next Page

Page Background

What particular example did you use in

this work?

The main objective was comparing two

types of birds, where we give the

annotator six seconds to explore the

objects and find the differences. Then

we show one instance of the two

previous classes, and give the annotator

five seconds to make a decision to

which class they think this image

belongs. The five seconds was kind of

short, and usually in gaze studies they

use longer fixations. In our case it was a

short time because the annotator had

to give a fast reply, so we had to

process the gaze data in a different way.

We had to take out the outliers and

have a shorter time for the fixations.

How did you solve the problem?

After we collected the gaze data, we

processed the raw data and obtained

the gaze points out of that. Then we

wanted to come up with a class

representation, which is needed in

zero-shot learning to aid the

classification task. To do this, we used

the gaze data for the individual images

of the class to get an image

representation, and then averaged all

these images to one class

representation. So we had three types

of representations, the details are all in

the paper. We extracted many features

from the gaze points - the location on

the images, the duration, the pupil

diameter of the annotator, and the

sequence information between the

points, which is the angle between

subsequent

points.

Using this

information we noticed that the

location, duration and sequence was

more helpful than using the pupil

diameter. Studies say that pupil

diameter helps indicating the

concentration level of the annotator,

but apparently as the annotator

became familiar with the categories,

their concentration dropped. So it

wasn't very helpful to use this

information, and we had better

performance using only the other

features.

What are the next steps for your work?

In future work we want to explore how

to combine the gaze information from

different images to represent one class

in a better way. We would also like to

do more experiments on more datasets.

Our work compared species level of

birds and pets, but we could explore

more or larger fine grained datasets, for

example asking how we can compare

on the subspecies level.

You seem very passionate about this

subject. What do you particularly like?

I always had this interest in computer

vision and how vision works in humans,

and how we understand that one object

is different than another, just by one

glimpse. So it was interesting for me to

study the human behavior. Zero-shot

learning is particularly interesting

because as humans this is easy. For

19

Nour Karessli