Computer Vision News - October 2018

Atrous depthwise conv: 3×3 Depthwise separable convolution decomposes a standard convolution into (a) a depthwise convolution (b) a pointwise convolution (combining the outputs from depthwise convolution across channels). And (c) depthwise convolution with atrous separable filter. Atrous Spatial Pyramid Pooling (ASPP) The idea is to provide the model with multi-scale information; to do this ASPP adds a series of atrous convolutions with different dilation rates. These rates are designed to capture long-range context. Encoder-decoder The encoder-decoder structure is a common architecture for semantic segmentation. Particularly, U-Net architecture is prevalent in medical imaging segmentation. The architecture is comprised of two parts: encoder (on the left in the figure below) -- the convolution filter extracts multi-dimensional semantic data, gradually reducing the spatial dimension; decoder (on the right in the figure below) -- reconstructs the spatial data lost due to the encoder’s spatial reduction. The U-Net encoder-decoder’s uniqueness lies in the gray arrows in Research 6 Research Computer Vision News