Computer Vision News - November 2018

Now, let’s look at an example with data. For purposes of the demonstration a synthetic dataset of vectors of length 32 was prepared, where term 1 of each input vector is purposely determined to be equal to the label that vector should receive as the resulting output label: After training the network, if we look at the weights vector output of the attention layer in this simplified case (see figure below), we shall see that the network indeed gives a much higher weight to term 1, as we would expect, given that we set this term to be identical to the label fitting the vector’s data. That is, we see that the attention mechanism was successful -- focused the network’s attention on the most relevant part of the input to help successfully classify the vector -- term 1. That was an extremely simple case; a somewhat more useful case is integrating the attention mechanism into an LSTM network. Just like the following code demonstrates, implementing an LSTM network which includes a simple attention mechanism, effective for understanding how it works. Focus on… 21 Focus on Computer Vision News