Computer Vision News

Understanding and implementing the t-SNE visualization When receiving new data to perform a learning task, the first challenge of the learner is to understand the underlying structure of the data. For example, given a set of images and their corresponding labels, we would like to explore the relation between the feature vector and the target: is there a pattern in the data? Is this pattern being complex? Simple? A good answer to these questions will make the learning task much more straightforward. The challenge lies in human perception. Most of us cannot imagine a high dimensional vector space . Furthermore, visualizing high dimensional feature vector becomes impossible when the dimension is larger than three. To this end, we will review and implement for you a very powerful visualization tool called t- SNE , which enables us to embed high dimensional data in a two- or three- dimensional space. We begin by explaining what exactly is t-SNE, and continue by implementing a code that performs such visualization on MNIST dataset. Understanding t-SNE t-SNE -- t distributed Stochastic Neighbor Embedding -- is an unsupervised technique to visualize high dimensional data. It was first introduced by Laurens van der Maaten and Goeffrey Hinton in the paper Visualizing Data using t-SNE (2008). To understand the impact of the original paper, it has gained over 7900 citations to date and it keeps growing. The t-SNE algorithm, given a set of n points in high dimension 1 , 2 , . . , seeks to find a low dimension representation y 1 ,y 2 ,..,y n such that the local and global geometry of the point is preserved. The embedding can be divided into 3 simples steps: first, by defining a pairwise probability distribution (similarity) between a pair of points x i ,x j . This distribution is denoted by p ij . Then, by defining a pairwise probability for the low dimensional points y i , y j denoted by q ij . Lastly, by optimizing over the low dimensional representation to minimize the distance between the distribution P to the distribution Q. We shall now review each step separately. 1) Similarity measure in the high dimensional space: imagine a set of data points, where for each data point, we center a gaussian around it. Given a point x i we look at the density of the point x j given x i . The idea is that if x i and x j are close, their conditional probability will be high; while if they are far apart, their probability will be low. To mathematically formulate this idea, a gaussian kernel 22 We Tried for You: t-SNE Focus on by Amnon Geifman “ A very powerful visualization tool ” Computer Vision News “… try the model yourself using our code …”

Computer Vision News - June 2019