CVPR Daily - Wednesday

DAILY Wednesday Workshop Preview and Challenge 20 methods have remained at the top of the 3D semantic segmentation leaderboard for the last 2 years. While point cloud methods have proven to be popular in recent times due to their minimalism and generality, they fail to capture the topology of shapes or the actual occupancy of the solid objects they represent. Voxels are a simple extension of pixels but as a dense representation of 3D space can be quite wasteful (both in terms of computation and memory, as regular voxel grids grow cubically with increasing linear dimensions). This is especially wasteful for 3D scenes where much of the space is simply “empty”. Benjamin Graham and Chris Choy introduced the use of sparse convolutional networks over sparse 3D representations that can ignore the “empty” space and only focus on the “active” cells. These methods allow for more efficient encoding and computation, and are used in several of the winning approaches in this year’s challenge. Benjamin’s Submanifold Sparse Convolutional Networks (presented at CVPR 2018) introduced the idea of using sparse convolutions for 3D semantic segmentation by convolving over a reduced set of active cells and their neighbors (see Figure). Chris Choy introduced the Minkowski ConvNet (presented at CVPR 2019), which generalized sparse convolutions to allow for dynamic, irregular convolutional kernels (see Figure). Come and hear them present on their latest work using these sparse convolutions and learn the details of how these methods work! Figure from Submanifold Sparse Convolutional Networks. Left: Naive application of convolution to a sparse set of active cells (light colored), causes active cells to rapidly grow (more light colored cells). Right: Submanifold Sparse ConvNets keep the number of activated cells fixed (again in light color), keeping computation and memory use down. Figure from Chris Choy, illustrating differences between sparse convolution and generalized sparse convolution.