Computer Vision News - August 2018

Now, let’s look at each step a little more in depth: Step I: Task-Specific Modeling For each of the 26 visual tasks, the authors trained a fully supervised task- specific network; the networks have a homogeneous architecture encoder- decoder. The encoder is uniform for all 26 tasks -- a ResNet-50 with no average-pooling and replacing the last stride 2 convolution with stride 1. This gives us an output shape of 16 16 2048; and the decoder is specific per task -- see table below. Research 6 Research Computer Vision News