Computer Vision News - June 2021

17 Automatic Gesture Recognition in Surgical Robotics Hence, the final training protocol includes dividing the frames in each video clip according to the gestures, sampling 25 frames, encoding a gesture context (1.67s), extracting the optical flow. An end-to-end deep encoder-decoder network is trained, and the information encoded in the representations is shownusing aU-MAP algorithm which reduces the dimensionality to a 2D plane. Then, the representations below are shown to cluster into two distinct skill-based clusters which correspond to beginners and experts. From these representations comes the first finding of the paper: each gesture has a unique representation depending on whether it has been performed by an expert or a beginner.

RkJQdWJsaXNoZXIy NTc3NzU=