CVPR Daily - Friday

4 DAILY CVPR Friday Oral & Award Candidate Jianyuan’s paper proposes a novel feed-forward reconstruction model that processes multiple input images to generate a 3D reconstruction. Unlike prior classical and deep learning-based methods, which often rely on time-consuming test-time optimization, this model operates without such constraints. Optimization techniques such as bundle adjustment or global alignment can take minutes or longer to complete. In contrast, Jianyuan’s model achieves reconstruction in seconds, significantly enhancing speed and efficiency. “Such optimization steps are usually non-differentiable and can’t work as a plug-and-play component in recent deep learning frameworks,” he explains. “That's the bottleneck for 3D vision these days. Therefore, we go for a feedforward only model!” Jianyuan identifies two major challenges in developing this model. The first was the need for a robust dataset to solve the problem in a data-driven manner. He collected 17 public datasets and processed them into a unified format, a task that required considerable engineering work. However, this was crucial because the quality of the data determines the limits for any method. The second challenge involved ensuring the model's generalization ability. “We want the model to handle an arbitrary number of input frames during inference,” he tells us. “Users may have only one frame Jianyuan Wang is a joint PhD student at the University of Oxford’s Visual Geometry Group and Meta AI. His paper introduces a superfast feed-forward reconstruction model, representing a significant advancement in 3D computer vision. Ahead of his oral presentation this afternoon, Jianyuan tells us more about his innovative work. VGGT: Visual Geometry Grounded Transformer

RkJQdWJsaXNoZXIy NTc3NzU=