Computer Vision News - November 2019

2 Summary Research 4 Every month, Computer Vision News reviews a research paper from our field. This month we have chosen Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling. We are indebted to the authors (Jacob Menick, Nal Kalchbrenner), for allowing us to use their images. by Amnon Geifman Generative models are a subset of unsupervised learning where, given a training set, the task is to generate new examples according to the data distribution. There are mainly two ways to model this problem: the first is by GANs (Generative Adversarial Network) - in this family of models a generator needs to fool a discriminator by learning the underlying data distribution and generating good looking images. The second family of models is Auto-Regressive (AR) , where the goal is to learn an explicit distribution governed by a prior imposed by the model structure. There are a few advantages for using AR models: it provides a way to calculate likelihood, the training is more stable, and it works on discrete data as well. AR models were able to achieve state of the art results on data such as text, audio and video, but on the other hand, it failed in the domain of unconditional largescale image generation. There are two main reasons for this failure, the first is due to the indirect relation between the maximum likelihood score and the image fidelity measure. The second is due to the high dimensionality of images of size 256x256x3 that require a huge amount of memory and computation in order to capture dependencies between a pixel and its neighbors. Introduction "The authors suggest a model named Subscale Pixel Network (SPN), a conditional decoder architecture that generates an image as a sequence of sub images of equal sizes."