MIDL Vision 2020

with Fabio Cuzzolin 17 videos in real-time. The processing needs to happen in real-time to detect what the main surgeon is doing as the video streams in. That means drawing a bounding box around the location where the action of interest is contained and then assigning a label to the section of interest.” For the SARAS-ESAD challenge , the team worked with surgeons at San Raffaele Hospital in Milan, Italy – the clinical partner in the SARAS project – to annotate four videos captured from real surgical procedures on patients. The videos are three to four hours long each. With expert doctors including Alice Leporini and Armando Stabile , they agreed on a list of relevant surgeon actions and then annotated the real- world videos by drawing a bounding box in each video frame around an action of interest and assigning a label to it. The challenge is for people to design an algorithm that can detect these actions. There are 21 different action classes and each frame can have more than one action instance present with potentially overlapping bounding boxes. The dataset was divided into three sets: training, validation, and test. The training data alone contains a total of 22,601 annotated frames and 28,055 action instances. The participants could train their systems on the training set and test them on the validation set to train them again. They had several weeks to do that before the test set was released, when they could calculate the results of their methods. These have now been published. There have been around 150 entries submitted and 15 different teams have participated in the challenge. Fabio tells us they were ranked according to two metrics: “The main metric for assessing action detection is mean average precision (MAP ). We selected the top three participants according to general MAP and decided to select the top three participants for average precision (AP) at 50 per cent. 50 per cent means that any detection is considered accurate when the predicted bounding box overlaps with the true bounding box by at least 50 per cent in terms of area. We assessed them for average precision with just a 10 per cent overlap, with 30 per cent overlap, and with 50 per cent overlap, and the average MAP value is the average of the accuracy of these