Computer Vision News - November 2020

3 Shiry Ginosar 39 what I do is collect data sets and apply machine learning, always in an end to end matter. Wh y can’t computer vision handle this yet? Is it because we have never trained it with the correct data in order to analyze this kind of information? Yes, one problem is the data. Up until now, we made a lot of progress with data that was easy to find. The biggest leap in computer vision happened because of ImageNet. ImageNet is made out of images that are there. It was the right thing to start from as these are the easiest images to find online, images that people just took, like snapshots or photography. It's harder to find data that would be useful to learn more nuanced things. That's one thing. Finding data is hard. The other thing that's hard is that a lot of what we do very well is supervised learning. That means that somebody needs to go in there and annotate the data. Whether it's word labels for ImageNet, bounding boxes, or detection work and ImageNet, COCO and datasets like that, or segmentation annotation: somebody needs to tell us exactly which pixels go into the horse and which pixels are outside of the horse, that sort of thing. Now we're working on a project where the question is whether when two people are talking, is their motion synchronized? Are they actually included in the conversation? Are they responsive to one another? How are you going to annotate that? That's very hard. People have been studying things like this in psychology, and usually they use very rough coding mechanisms. For instance, when they're looking at babies and mothers intersubjectivity. These are the little responses between them, the motion conversation between the two. One side of the conversation moves, and then the other side does something within a second. It takes psychology friends forever to do this just for an hour. It's impossible for us to learn anything “It’s a whole different world!”