ICCV Daily 2019

Okan Köpüklü is a PhD student at Technical University of Munich under the supervision of Professor Gerhard Rigoll . He speaks to us following his poster presentation for the Neural Architects workshop . In 2012, the ImageNet Challenge was held and 2D CNNs dominated the majority of computer vision tasks. For a long time, the trend was in making deeper and wider CNN architectures to achieve higher accuracies. Eventually, resource-efficient applications were needed, and so resource-efficient 2D CNNs have begun to be built. Okan says the same story is now repeating for 3D CNNs, but we are at the level of increasing accuracy. The most recent one is a slow-fast network which is trained on 128 GPUs. It is a very large model but achieves state-of-the-art results. Okan and the team have implemented the 3D versions of some famous 2D CNN architectures – MobileNet, ShuffleNet, MobileNetV2, ShuffleNetV2 and SqueezeNet – to investigate their performance on video classification tasks. They inflated the 2D CNN architectures to 3D and evaluated them on three major benchmarks: Kinetics-600 , Jester and UCF-101 . He explains why they chose these tasks: “Kinetics-600 can measure the capacity of the experimented networks. For this task, the networks should capture the spatial patterns beside the motion patterns. There are nine different eating-something classes – the ‘something’ being a burger or a pizza, for example – so you need to capture the motion pattern together with the spatial information. For the Jester dataset, we wanted to investigate the network’s ability to capture motion patterns. In this dataset, there are 27 hand gestures with more or less the same spatial contents. There is a person in front of a camera and performing a hand gesture, so you need to capture the motion of the hand in order to make the correct classification. Then we analyze UCF-101 to check the 10 Workshop presentation DA I L Y Resource Efficient 3D Convolutional Neural Networks

ICCV Daily 2019 - Tuesday