Computer Vision News - November 2018

22 Computer Vision News Focus on… Focus on We can see the LSTM layer included inside the model_attention function, immediately followed by the attention mechanism ( attention_3d_block function), which, just as in the previous example, weighs the terms of the input by their contribution to successful prediction. The attention layer is again implemented using Dense, whose output is multiplied by the input received by the layer - giving a different degree of focus to different terms of the input. Note the need to appropriately permute the data, so the weights of the attention mechanism affect the terms of input vectors, not differently weighing the same channel for different samples. However, the LSTM with attention model is not without drawbacks and difficulties. First, there is still the vanish gradient problem for long sequences. Second, there is a computational difficulty. LSTM requires 4 linear layers (MLP layers) per cell and for each sequence time- step. Linear layer computation is very memory bandwidth intensive, incapable of being run on many computation units because of insufficient memory. Memory- bandwidth-bound computation is one of the major challenges for hardware designers, ultimately limiting the usefulness of neural networks. A somewhat more sophisticated attention mechanism is the Transformer, which Computer Vision News reviewed last year . A brand new method, which might mark the end of LSTM networks is the new method described in the paper “ Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction ”. In this method the authors use a CNN network traditionally used for image classification. However, in this paper the authors represent the data as “images” in a unique way: given a training pair of source and target sentence fragments (s, t) with lengths |s| and |t|, the model embeds them as {x1, . . . , x|s|} and {y1, . . . , y|t|} and concatenates them into an “image” where each “pixel” is a concatenation of the two embeddings --

RkJQdWJsaXNoZXIy NTc3NzU=