Computer Vision News - January 2019

Every month, Computer Vision News reviews a research paper from our field. This month we have chosen Rethinking ImageNet Pre- training . We are indebted to the authors ( Kaiming He , Ross Girshick and Piotr Dollár ), for allowing us once again to use their images to illustrate our review. Their article is here . This remarkable paper demonstrates competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are on par with ImageNet pre-training. Training from random initialization is surprisingly robust; the results hold: 1) even when using only 10% of the training data, 2) for deeper and wider models, and 3) for multiple tasks and metrics. Experiments show that ImageNet pre- training speeds up convergence early in training, but does not necessarily provide regularization or improve final target task accuracy. Introduction: Starting with the RCNN articles, the early breakthroughs in using deep learning for object detection were achieved with networks pre-trained for image classification on ImageNet and then fine-tuned on the intended dataset for object detection. Following these results, most modern object detection networks and many other computer vision algorithms use the pre-training then fine-tuning paradigm. Some of the latest articles published push this paradigm even further, by pre-training on datasets 6 to 3,000 times the size of ImageNet (JTF 6×, ImageNet-5k 300×, and Instagram 3000×). The paradigm, while showing significant improvement for image classification training, only provides a little improvement when training for object detection tasks (up to about 1.5%). This improvement dwindles the larger the object detection task dataset is relative to the pre-training dataset. Method and Innovation: In this paper, the authors show: 1. Though ImageNet pre-training speeds up convergence, training from scratch achieves the same accuracy given sufficient time. Note, that in training from scratch the network must learn low-level and mid-level features (such as 4 Research - Rethinking ImageNet Pre-training Research by Assaf Spanier Computer Vision News Do we really need ImageNet pre-training?