Computer Vision News - January 2020

3 Summary Endoscopic Vision 21 methods use deep learning models for this challenge. The general approach is to use a CNN, such as ResNet-50 or VGG16, to extract the features from each frame. The features are then used in different ways for the different tasks at hand. For tool detection, the features are fed into an additional CNN to output the location and type of the tool. For action and phase recognition, the features are used as input to a Long Short Term Memory (LSTM) model which is the most commonly used recurrent neural network. The LSTM receives as input (besides the features of the current frame) the output of the previous frame which allows it to utilize the temporal dimension of the video. Results In this challenge the submitted algorithmic models were evaluated over a test set which consisted of 9 annotated videos of laparoscopic surgery. The videos were annotated by one or more experts, depending on the difficulty of the task. Models were evaluated using the F1 metric. For the tool presence detection task, thebestmodel was able toget a64% average F1. For the phase segmentation the best model was able to get a 65% average F1. The full article is available in the Endoscopy section of our website. Next Article :