Computer Vision News - October 2018

robustly segment objects at multiple scales and sampling rates. (3) ASPP module augmented with image-level features and batch normalization. (4) A decoder module using depthwise separable convolution to recover object boundaries. By the end of this review you will understand these 4 features. Background: Let’s start with the methods that comprise the building blocks of DeepLab: Depthwise separable convolution is a variation of convolutional filter. Standard convolution is a convolution process that is performed simultaneously spatially and across channels. Depthwise separable convolution, on the other hand, is a two-step convolution process. First, separable convolution performs spatial convolution for each channel separately. As can be seen in the upper part of the figure above -- each channel is dealt with by a different convolution filter. The second step is 1x1 convolution across channels, this is exactly the same as a “regular” convolution, just with a 1x1 maximum spatial filter size. This step can be repeated several times for different output channels. How much of an efficiency gain is there? Let’s look at an example: Let’s suppose a neuron network layer with an input layer size of 3x8x8, with 16 3x3 filters: ● Computation using standard convolution yields 16x3x3x3 = 432 parameters ● Depthwise separable convolution yields 16x3 + 3x3x3 = 48+27 = 75 parameters Atrous convolution, also known as dilated convolution, is just a convolution applied to input with defined gaps. Unlike standard convolution, atrous convolution uses empty gaps inside the filter, determined by parameter r (short for rate). r determines the rate of empty space: the larger r, the larger the empty gaps. In the figure below several r rates between 6 and 24 can be seen. Computer Vision News Research 5 Research Computer Vision News