Computer Vision News

17 Automatic Gesture Recognition in Surgical Robotics Hence, the final training protocol includes dividing the frames in each video clip according to the gestures, sampling 25 frames, encoding a gesture context (1.67s), extracting the optical flow. An end-to-end deep encoder-decoder network is trained, and the information encoded in the representations is shownusing aU-MAP algorithm which reduces the dimensionality to a 2D plane. Then, the representations below are shown to cluster into two distinct skill-based clusters which correspond to beginners and experts. From these representations comes the first finding of the paper: each gesture has a unique representation depending on whether it has been performed by an expert or a beginner.

Computer Vision News - June 2021