Computer Vision News - November 2016

What is t-SNE all about? In a nutshell, t-SNE is a technique for dimensionality reduction that is suited for the visualization of high-dimensional datasets (like the COIL-20 image). t-SNE visualizes the high-dimensional dataset by projecting each high-dimensional data-point into 2D or 3D space. t-SNE has become so popular because: (1) it was demonstrated to be useful in many situations, for example in this Kaggle challenge and as demonstrated in Deep Learning and Human Beings ; (2) it’s incredibly flexible, and can often find structure where other dimensionality-reduction algorithms cannot. However, in an effort to tidy-up visualization, the algorithm makes all sorts of adjustments, whose effects can be difficult to understand and to evaluate – making it intimidating. Don’t let the 'mysterious' internal mechanisms make you give up on the entire technique, though. Putting some work into studying how t- SNE behaves in simple cases, you can gain an intuition for how those internal mechanisms work. And that's what we are about to do: We will start by explaining the method itself, followed by set of illustrative examples to demonstrate it's performance in various conditions. t-SNE constructs two probabilities: (1) The first probability , over the pairs in the high-dimensional space, which gives similar points a high probability and dis- similar points low-probability , and (2) the second probability is over the points in the low-dimensional space, this distribution is learned by t-SNE by minimizing the Kullback–Leibler divergence between the two distributions ( and ) with respect to the locations of the points in the map. is defined as follows: Given a set of N high-dimensional points 1 , . . . . , is the probabilities that are proportional to the similarity of points and as follows: 60 Computer Vision News Tool t-Distributed Stochastic Neighbor Embedding (t-SNE) Tool This distribution has an entropy which increases as σ i increases. t-SNE performs a binary search for the value of σ i that produces a with a fixed “perplexity”. The “perplexity”, which is specified by the user, indicates how to balance between local and global aspects of your data. In the example section we will elaborate on this important parameter with a set of concrete examples. Distribution is, as mentioned, a learnable d-dimensional distribution which maps 1 , … , that reflect as much as possible the similarities . To this end, it evaluates the similarities between and using: | = (− − 2 2 2 ) ≠ (− − 2 2 2 @ = | + | 2 = 1 + − 2 −1 ≠ 1 + − 2 −1