ICCV Daily 2021 - Wednesday

“ For the network, it was quite simple, we used ResNet-18 backbone, ” Viktor tells us. “ For the dataset, we had to develop our own GPU simulator that could produce arbitrary hand motions with all the kinds of randomization, like lighting, backgrounds, and more. When combined, we got some great results on the synthetic data, but when we tried to apply this to real data, the results were very noisy. We tried different kinds of filters and ended up with a Kalman filter , which works the best and allows us to combine fast motion prediction and slow motion without artifacts in between. ” At first, it was not clear how to input this event stream into ResNet , which accepts images. They had to come up with a representation which was a new way of accumulating these events that did not lose the temporal resolution that the events have. They called it locally-normalized event surfaces (LNES) . Vlad tells us more: “ The main idea of the LNES representation is that we convert asynchronous event stream representation , which is not consumable or understandable well by the existing neural networks – such as convolutional neural networks which capture the spatial dependency and the spatial context in a 2D image – to a representation that has such spatial context . Essentially, LNES is an accumulated subset of events in a certain time window, which is sufficiently small to be able to infer at very high frame rates, but which is nevertheless sufficiently large to provide enough information as an input to our methods. ” 15 DAILY ICCV Wednesday Viktor Rudnev