She says that usually if you train
something on synthetic data, it is going
to not work as well on real data
because it hasn’t seen data
characteristics like the real world. “
The
real world is noisy and partial
”, Angela
says, “
because you can’t get a fully
complete view of an object
”. They
therefore have some benchmarks to
show that if you train for real data
tasks, you can do much better than on
training just on synthetic data. “
So it
matters a lot that we have real-world
data, and we could definitely still use
more
”. With ScanNet and the current
1,500 scans they provide a good start,
and Angela told us that they were able
to show that they are able to
generalise to some previous real-world
datasets that were smaller. “
But of
course, we would like to get more, and
this is still what we are working on
.”
In the future, Angela would like to see
something running on a tangible thing.
She previously worked on 3D
reconstruction, where they built a real-
time 3D reconstruction system, but
from this work she knows that it’s very
hard to use this kind of thing in
practise. One of the things that is
missing, she told us, is to be able to get
a semantic understanding of the scene.
Because even when a model looks
reasonable, you still want to know
where things are, or what they are, in
order to actually at least have virtual
agents or robots interact with them. “
I
want to be able to make this happen
for real
scans!
”,
she adds
enthusiastically.
Angela also told us about the next
steps in line of this work. One of the
“obvious things” is to scale this up
even larger than thousands of scans -
they aim to go up to ten thousands.
This however requires a different kind
of data acquisition she noted, where
instead of only crowdsourcing the
annotation task, they also want to be
able to crowdsource the
reconstruction task as well. Besides
this, there is also a lot to be done in
terms of semantic segmentations.
“
Right now, our tasks are still basically:
What are objects?
”, Angela explains,
“
and there is a lot more interesting
tasks on this type of data
”.
One of them she is particularly
interesting is connecting the real-world
data with synthetic CAD models. They
did this a little bit with ScanNet, but
they want to push forward, to have an
association with synthetic CAD models
with the real-world scans. E.g., when
you align a synthetic chair on top of
the real chair, and then correlate these
two. “
Ideally, you can basically learn a
transform to go between real to
synthetic
”, Angela says. And this is a
way you can make a model useable -
since synthetic models are easy to
manipulate and they are fully
complete. It is also much easier to train
something on synthetic data, but it’s
not easy to transfer that information to
the real world. But if you had this
correlation between the two, then it
could be possible to learn the transfer
between synthetic and real data. A
method like this might be usable in a
VR/AR application.
“A semantic
understanding
of the scene”
Angela Dai
18
Sunday“What are objects?”
BEST OF CVPR




