Computer Vision News - February 2019

CNN jointly learning semantic segmentation and Deep Maps; this method uses CNN-based depth prediction with SLAM to overcome the traditional limitations of monocular reconstructions, combining geometric building blocks like depth estimation with semantic segmentation to improve upon the traditional Visual SLAM pipeline. Yin et al presented an unsupervised approach architecture for jointly deep learning optical flow and movement estimation, based on video all this without pre-segmentation, which currently achieved the top, state of the art, results for all tasks on the KITTI dataset. Their approach eliminates the need for annotation, based on the idea of using the strong interdependence of each geometric vision task (depth, motion and optical flow) to construct a joint loss function based only on consistency checks. In-map localization is an essential task for SLAMs, with location described by DOF-6, which can be reconstructed using features from feature-based pipelines like SfM. CNN-based approaches map a single RGB image directly as captured by the camera. One such method is Kendall et al’s PoseNet, whose memory requirement isn’t linearly proportional to input data volume. PoseNet proved robust to motion blur, different lighting conditions and camera factors that tripped-up SIFT. Brachmann et al replaced direct regression of the 6-DOF camera pose with a sequence of less complex tasks: First, a network that learns to map limited image areas to 3D scene-space; Then, a differentiable RANSAC approach proposes a camera pose aligned with the previously-predicted scene coordinates. This approach utilizes geometrical constraints to improve performance, while still providing all the advantages that come with being an end-to-end trainable pipeline. Bundle Adjustment: Before concluding, it is worth noting there is currently no CNN-based Bundle Adjustment solution to speak of. The last year saw some preliminary attempts - some approaches trying to model projection differently, some techniques attempting jointly learning parts of the Visual SLAM process using defined geometric part, but nothing concrete. This field still requires much thought and hard work. Conclusion In practice, CNNs are quickly becoming the go-to solution for vision tasks like object detection and semantic segmentation for Automated Driving, they also show promising progress dealing with geometric vision algorithms, like depth estimation and optical flow. Nevertheless, progress on applying CNN-based approaches to Visual SLAM tasks is still slow. The paper provides an overview of Visual SLAM for Automated Driving and potential opportunities for using CNNs for different stages of the process. 9 Research Computer Vision News Visual SLAM for Automated Driving