Computer Vision News - February 2020

3 Summary Revealing Scenes by Inverting ... 5 not; second, generate a dense object (i.e an image) using a sparse one (i.e point cloud); and third, generate natural and good-looking image conditioned on the viewpoint and the given points. To solve all the above challenges, the authors train a cascade of three encoder- decoder neural nets: VisibNet for estimating the visibility of points in the point cloud, CoarseNet for estimating an RGB image using the output of the previous network, and RefineNet that outputs the final color image using the output of the previous network. A summary of the model can be viewed in the following figure. We next describe each of the networks separately: VisibNet - since SfM point clouds are usually sparse, it is hard to determine which point is visible and which one is occluded. To this end, a regression-based network is trained to predict in a supervised manner which point is visible and which one is not. Given training pairs of input features maps F x and target source images x, the objective for training VisibNet is: Where V denotes the neural network and Ux denotes the ground truth visibility map for F x . CoarseNet - this network is trained in order to generate a coarse guess for the final image. It is trained given the output of VisibNet with fixed weights. The training is done using a combination of an L1 pixel loss and an L2 perceptual loss (taken over the output of certain layers in VGG16). The objective of CoarseNet is of the form: ( ) = −∑ [ log ( ( )+1 2 ) + (1 − ) log ( 1− ( ) 2 )] = || ( ) − || 1 + ∑ ||Φ ( ( )) − Φ ( )|| 2 2 =1,2,3