Computer Vision News - August 2018

worked on in particular, optimal transport, is now becoming very important in machine learning. It’s interesting that very fundamental mathematics are now drifting into our field of applied mathematics. You are working in the field of computer vision. Can you tell us more about what you do in the lab? In our lab, we function exactly like an academic lab. Our aim is to publish papers and code. So I work on a pretty fundamental research problem which is video prediction. We are given an input sequence and we want to predict plausible videos that might follow this video. This problem is gaining attention from the community for several reasons. The first reason is that it’s a proxy task towards providing computers with the ability to anticipate future events. We think that this is an essential component of intelligence. The second reason is, if you look at babies, they learn through observation and interaction with the world about physical concepts. For instance, by the age of two months, they learn about object permanence. By eight months, they learn about gravity and things like that. You test that by showing them plausible and implausible sequences, and you see how they react. If they are surprised, it means basically that the implausible video has broken their model of the world, which makes them interested. Since this idea works in babies, we want to see if it can also work for machines so that through observation and by predicting future events, we can make them develop an interesting and useful representation of those physical concepts, which can then be used in other tasks. It’s really one of the main current approaches to unsupervised learning, to leverage unannotated data. Finally, it also has very practical applications. For instance, if you can predict trajectories of vehicles or pedestrians, you can imagine then using this kind of algorithm in autonomous driving and other decision-making contexts so it’s also useful. It will be useful! [ laughs ] The nice thing is that, for now, it doesn’t work at all... That is surprising. Well, currently, approaches for video prediction either predict a very, very short time ahead or they predict for a longer term but only on toy examples. Whenever you change the toy problem, everything breaks. Basically, this is due to the fact that we don’t know yet how to model very complex and multimodal distributions, especially when they are conditional. For me, this problem is a great playground for trying to do that. And how are you contributing to this field? Well, none of the motivations that I've given for video prediction specifically requires that prediction be made at the RGB level. So with my advisors and colleagues at Facebook, we proposed to shift this task to a new task of predicting the semantic segmentation of future frames, instead of the actual Pauline Luc 25 Computer Vision News Pauline L Women in Science