5 Computer Vision News Computer Vision News Jianyuan Wang already observed follow-up works, including AnySplat, which utilizes VGGT’s feature backbone to enable feed-forward Gaussian parameter prediction for novel synthesis, and Spatial-MLLM, which combines its backbone with other large vision models to establish a unified foundation model for 3D perception. “In the future, we could see further trials on 4D tasks,” he envisions. “As we go from 2D to 3D, I think in probably two or three years, we’ll have something good in 4D. In 4D, people dance, run, and many scenes are dynamic!” In conclusion, while Jianyuan’s model represents a significant step forward, he emphasizes that datadriven 3D vision is just the beginning. “As Rich Sutton said in 2019, general approaches that leverage computation will ultimately prove to be the most effective,” he reflects. “This ‘Bitter Lesson’ has attracted great attention in the 2D and NLP communities, and we believe it’s true for 3D as well. Feed-forward models will be the future of 3D vision.” NOTE: this article was published before the announcement of the award winners. Which explains why it does not mention being a winning paper. Once again, we placed our bets on the right horse! Congratulations to Jianyuan and team for the brilliant win! And to the other winning papers too!
RkJQdWJsaXNoZXIy NTc3NzU=