Computer Vision News - November 2020

2 Women in Computer Vision 38 Shiry Ginosar is a postdoc at the University of California, Berkeley working as a Computer Innovation fellow alongside Professor Jitendra Malik. Ca n you tell us about your work? What I work on, in general, is rich perception. What do I mean by that? It seems like computer vision is doing really well these days. We can do things like detection, segmentation, and 3D understanding. We are almost at the point that we can have self-driving cars that we can release in very sterile environments. We can do all kinds of things that computers care about. But computer vision doesn't work at all for things that humans care about, that we care about. For example, if you think about interpersonal communication, when people talk to each other. We require a lot of information that's not captured at all in current models. We don't only care about what the subject matter is, which would be classification, or where it is in the image, like detection or segmentation. We don't really care about the full 3D shape of what we're looking at. We actually care much more about the nonverbal stuff. We read a lot about other people, their gestures, how they look today, what they're wearing, and all kinds of little motions that they make with their faces and in their bodies. We don't just care whether there's a person there. We care whether they're fashionable, whether they're rich, are they useful to us? Is the person listening? Are they excited, or are they bored out of their minds? Are they stressed or are they being sincere? Are they lying to me? In the case of cars, if I'm looking at a kid that's playingon the sidewalk, I would think are they likely to jump into traffic right in front of me? Should I stop myself driving the car? That sort of thing. We are completely stuck on making progress on any of these questions. That is because they're hard to formulate, and they're hard to evaluate. It’s very mushy stuff. The data to learn from is really hard to find and annotate. What I've been trying to do is to achieve the kind of rich perception that can enable these natural human tasks. To do that, I usually focus on building all kinds of necessary tools that are needed to learn these kinds of things directly from big data. A lot of

RkJQdWJsaXNoZXIy NTc3NzU=