Computer Vision News - May 2018

Every month, Computer Vision News reviews a research paper from our field. This month we have chosen to review the version 3 of YOLO (You Only Look Once), a state-of-the-art, real- time object detection system . We are indebted to the authors Joseph Redmon and Ali Farhadi , for allowing us to use images from the paper to illustrate this review. Their work is here . Introduction: Remember YOLO ? Here comes version 3, with a number of updates and tweaks to improve performance, while still maintaining real-time speed and output. A brief paper of just 4 pages interspersed with numerous jokes and call-outs. Once you filter out the “noise” the following facts remain: 1. Softmax was removed. Instead - a simple, independent, logistic classifier was used during training, as it was found that a multilabel approach better models the data. A cross-entropy loss was used for the class predictions. 2. YOLOv3, similarly to feature pyramid networks, extracts features from boxes at 3 different scales. 3. The number of features YOLOv3 uses for prediction in the case of the COCO dataset is N × N × [3 ∗ (4 + 1 + 80)], where 3 is the boxes of 3 different scales, 4 is the bounding box offsets, 1 objectness prediction, and 80 class predictions. 4. A new network for performing feature extraction was used. A hybrid approach between the network used in YOLOv2 and that of Darknet-19. The new network uses successive 3x3 and 1x1 convolutional layers with shortcut residual connections. It has 53 convolutional layers so it was named Darknet-53 :-) Results: The following table presents the evaluation of the new backbone used by YOLOv3, one of the central changes of new YOLO. You can see it achieves accuracy comparable to that of ResNet-152, with one-third as many layers, resulting in a higher speed, though not as fast as Darknet-19, but since videos have a rate of 24 frames per second, this is more than enough. 4 Research: YOLO v3 Research by Assaf Spanier Computer Vision News

RkJQdWJsaXNoZXIy NTc3NzU=