Computer Vision News - January 2019

edge, textures, etc.), which it usually learns in pre-training. 2. When pre-training paradigm results are reported as being more efficient, the pre-training time isn’t always taken into account. 3. The authors show that competitive results can be achieved training on just 10% of the COCO dataset from random initialization, if hyperparameters are carefully selected to prevent overfitting. Using the same hyperparameter settings as pre-trained networks, random-initialization training will achieve the same results, even when trained on only 10% of the dataset. 4. ImageNet pre-training shows no benefit when the target tasks/metrics are more sensitive to spatially localized predictions. In the context of the current state of the art, these results are surprising and should challenge the ImageNet pre-training paradigm’s influence. ImageNet pre- training is and in the near future will continue to be the go-to solution, especially: 1) where developers have insufficient data or computational resources to train on their target task from scratch, and 2) since ImageNet pre- training is widely seen as a ‘free’ resource, thanks to the labeling and annotation efforts that have been done before, and the approachability and wide availability of ImageNet pre-trained models. However, the authors’ observations suggest that looking forward, when developers have sufficient data and as computational resources improve, training from scratch / from random initialization should be seriously considered. The paper demonstrates that collecting data and training directly on the target task is a solution that needs to be considered, especially in such cases where there is a significant disparity between the pre-training task and the target task. The new evidence provided and explored by the paper points toward a need for the community to discuss and reevaluate the pre-training -- fine-tuning paradigm. Implementation: Let’s look at the architectures, training rate, optimization and normalization methods and hyperparameter settings used in this work. Architecture Mask R-CNN with ResNet, ResNeXt or ResNeXt plus Feature Pyramid Network (FPN) backbones were investigated. Normalization The normalization methods commonly used in training the standard pre-trained networks are less suitable for detection and segmentation training, since these normalization methods require loading very large volumes of data for training: Computer Vision News Research 5 Research Computer Vision News