Computer Vision News - January 2024

3 Computer Vision News Estimating Generic 3D Room ... In contrast, we propose the first method for annotating generic 3D room layouts on commonly available RGB videos. We ask human annotators for an easy task: draw an amodal segmentation mask for each structural element in 2D, and also approximately mark its visible parts in 2D. Moreover, all annotation is performed on each video frame independently without requiring the annotator to provide correspondences among video frames. The simplification of the annotator task is enabled by shifting much of the work to an automatic method we propose, capable of deriving 3D room layouts from these 2D annotations. This method estimates a 3D plane equation for each structural element, as well as a finite spatial extent that captures all parts of the plane that are in the camera's field-of-view at any time during the video. We estimate all elements jointly and connect adjacent elements at the right contact edges in 3D, matching their shared edge as observed in the 2D annotations. Using the proposed approach, we annotate 2246 scenes from the RealEstate10k dataset that contains YouTube videos of indoor scenes. The rooms are complex and cover generic types, not limited to Cuboid/Manhattan. They can even be composite, such as two rooms connected by a door or a staircase. The method also works when the video does not show the full room (a common case).