ICCV Daily 2025 - Wednesday

Inside: Exclusive review of the ICCV 2025 Best Paper We captured the story insightfully before the win ☺ DAILY Wednesday This is what a Best Paper looks like!

Luca’s picks of the day (Wednesday): Hello, I'm Luca Collorone, a PhD student at Sapienza University of Rome, working with Fabio Galasso. During my PhD I focused on human motion generation, aiming to design generative models that not only produce realistic motion but also understand and evaluate it. To this end, my research has explored several directions within the broader landscape of motion generation, including generation-guided motion anomaly detection, motion alignment through human feedback, and contrastive learning for motion representation. 3B-5 DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior 4A-5 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video 3-256 PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning 3- 62 PINO: Person-Interaction Noise Optimization for Long-Duration and … For today, Wednesday 22 2 Luca’s Picks DAILY ICCV Wednesday At ICCV we will present MonSTeR, a unified latent space that captures higher-order relationships between skeletal motion, scene context, and textual descriptions. This representation enables flexible and robust retrieval across modalities and supports tasks such as motion captioning and zero-shot object placement within 3D scenes. By bridging these modalities, this work contributes to a deeper understanding of motion as a structured, multimodal phenomenon. Orals: Posters:

3 DAILY ICCV Wednesday Aloha ICCV! Chee-hoo! We did it again! We have reviewed for you 6 of the 13 candidates for the Best Paper award. Believe it or not, once again we got the winner! I asked General Chair Gérard Médioni about his thoughts. He told me: You might need Google Translate for the last part ;-) Turn the page to read about the winning paper and hauʻoli i ka Hawaiʻi! Ralph Anzarouth Editor, Computer Vision News Ralph’s photo above was taken in peaceful, lovely and brave Odessa, Ukraine. ICCV Daily Editor: Ralph Anzarouth Publisher & Copyright: Computer Vision News All rights reserved. Unauthorized reproduction is strictly forbidden. Our editorial choices are fully independent from IEEE, ICCV and the conference organizers. Editorial Hawaii is a magical place. The computer vision community came to the islands in 1991, 2001 and 2017 with CVPR, and many times for WACV. What better setting to witness the magic enabled by the continuing progress of the field! We sincerely hope you also find time outside of the conference to discover the rich culture and beauty of Hawaii. Hauʻoli i ka Hawaiʻi

Most generative AI techniques focus on generating things for the digital world. For example, text, images and videos and digital 3D models. But what Ava and co-authors wanted to do was to bring generative AI into the physical world, not the digital world. They wanted to generate objects that could actually be built from pre-made pieces in real life. And once they're built in real life, they could stand up and be stable. Ava Pun is a second year PhD student at Carnegie Mellon University, under the supervision of Jun-Yan Zhu. Ava is also the shared first author (with Kangle Deng and Ruixuan Liu) of a great paper, which was accepted at ICCV as an oral and a candidate for which won the Best Paper award at ICCV 2025 ☺ Ahead of her oral and poster presentations today, Ava tells us more about this work. Generating Physically Stable and Buildable Brick Structures from Text 4 DAILY ICCV Wednesday Best Paper ICCV 2025!!! This interview was conducted before the ICCV 2025 awards were known Kangle Deng Ruixuan Liu

In pursuit of that goal, they developed BrickGPT, which is a model that generates brick structures as a list of a brick-by-brick list. Structures made out of toy bricks, such as Lego bricks. And those structures, when they are built in real life, they will stand up and not collapse. Theirs is a quest for stability and physical possibility. But what advantages will we have once the model is stable? If we can make a model generate stable 3D outputs, it could have a lot of applications in manufacturing and design and architecture. For example, we could design some custom furniture for someone with specific needs - maybe the furniture has to be lower than usual. Or maybe someone could design houses and buildings very quickly using generative AI techniques. Those houses and buildings, of course, which will have to stand up. One challenge was how to determine if something is stable or will be stable when it is built in real life. That means running a full physics simulation to determine whether it is stable, which could be very time consuming and resource intensive. So the team built a physics model that is specific to these toy bricks, that accounts for all the forces that are applied to each brick. And then the physics simulation tries to set all the forces to zero by using an optimization technique. If that is possible, then there are no forces. The structure will not move and it will not fall down - it is stable. Otherwise, it is unstable and it will fall down. 5 DAILY ICCV Wednesday Ava Pun “It's way more than just Lego!”

6 DAILY ICCV Wednesday We asked Ava, what makes it difficult to assess stability? Testing whether the structure is stable is not a straightforward problem, because existing physics simulators can't reliably simulate the contact points between the bricks. That's why they to developed a customized-physics reasoning algorithm. They simplified the model and developed this custom algorithm, which accounts for all the physical forces that each brick experiences due to gravity contact and friction. And then, using this force model, they used an optimization algorithm to try and make all the forces sum to zero, which means that the structure would be in static equilibrium and it won't shift or collapse - it is stable. Why should we come to both oral and poster presentations today? “Because this is a very cool project!”, is Ava’s confident reply. “This is one of the first times that people have tried developing generative AI for the physical world. And if you come to our talk and our poster, you will Best Paper ICCV 2025!!!

7 DAILY ICCV Wednesday Ava Pun see some of the cool structures generated by brick GPT that were actually built by humans in real life. We also made a robot system that picks up the bricks and puts them together and also can build the structures in real life. We brought some bricks, so we're planning to bring some real bricks to the poster and build our structures in real life so you can see them there standing up and touch them.” As awesome as it sounds, we want to know why - out of more than 11,000 submitted papers, this paper has come in the top 13. Ava’s guess is that the challenge that they're trying to tackle is, very applicable to a lot of people and very understandable to a lot of people. “Because everyone's played with Lego bricks before!”, she declares. “Everyone knows how important it is to actually make the structure stable and make it buildable in real life. And bringing generative AI out of the digital world and into the physical world is something that a lot of people haven't seen before and would probably like to see because it's something that we all experience every day in the physical world!” The authors ended up writing a user study where a bunch of people wrote prompts, submit them, and then the model would return instructions so that the people could either build them themselves or it would send it to a robot and then the robot would build the results. “That was really cool! People liked it! They liked being able to see just a text prompt in their mind, send it to the computer, and then getting this physical product that you could touch. That was really cool for me and everyone involved!” A funny detail: during the process of developing the model, the team came up with many attempts that didn't work so well and generated many images of chairs that were obviously not very good. Ava was kind enough to share a set of these - here it is it below.

8 DAILY ICCV Wednesday Ava is firmly convinced that this can open new directions. Taking the custom furniture generation example again, it is a harder problem than just making something stable - because someone has to sit on it and it has to be strong enough to hold their weight. Same thing for architecture. It has to be extremely stable; so that it's very bad if your chair collapses, but it's even worse if a house falls down. “These are definitely very exciting avenues to explore”, Ava remarks. “Even though our model is tested and trained on these Lego toy bricks, the project isn't really about just Lego bricks. The project is about extending this to general buildability and stability, trying to make generative AI that puts things that can work in the real world. It's way more than just Lego!” To learn more about Ava’s awardwinning work, visit Oral Session 4A: Vision + Graphics (Exhibit Hall III) today from 13:00 to 14:15 [Oral 2] and Poster Session 4 (Exhibit Hall I) from 14:30 to 16:30 [Poster 306]. Best Paper ICCV 2025!!!

Double-DIP Don’t miss the BEST OF ICCV 2025 iSCnul i Ccbkos cmhreipbrue t feor rVf ri sei eo na nNde wg es toi tf iNnoyvoeumr bmear i. l b o x ! Don’t miss the BEST OF ICCV 2025 in Computer Vision News of November. Subscribe for free and get it in your mailbox! Click here Target with solid fill

Carl’s main motivation is not solving a real-world problem. “I do it”, he says, “because I enjoy these problems. That's my main motivation for this, but we know and we've seen fairly recently a couple of papers interested in global structure for motion, in which rotation averaging is a key component. And it's used a lot in SLAM because you can get faster pipelines if you don't have to do bundle adjustment that often, so I think that's like the main benefit of this. And then to be able to get an even more accurate objective function than what we have right now, I think it's a good thing!” But why wasn’t this problem solved until now? Carls doesn’t really know. It feels almost like a trivial problem when you think about it. There is indeed a trivial modification of the old rotation averaging framework that seems like it's going to work and it works in some cases. But Carl was very surprised to see that when he tried it for structure for motion problems, it didn't work. The reason is in the mathematics, he found: “I had to dive into the mathematics and try to understand what went wrong. So I guess nobody really looked into it that much before!” Carl Olsson is a full professor at Lund University. His ICCV 2025 paper was accepted as an oral in the 3D Pose Understanding session today and also as an award candidate paper! This work is about rotation averaging, which is a problem on which Carl has been working for a long time, with enjoyment. In particular Carl and co-authors tried to approximate objective functions like reprojection error and things that we have in vision and try to bring that into the rotation averaging framework, while retaining sort of the ability to do global optimization over these functions. Certifiably Optimal Anisotropic Rotation Averaging 10 DAILY ICCV Wednesday Oral & Award Candidate

This is a classical computer vision problem, the 3D reconstruction problem, structure for motion, although Carl likes to take a mathematical spin on it. In fact, the main challenge of this work lays in the mathematics: to understand these functions, how they behave on the SO(3) manifold, the manifold of rotations. With this work, Carl is particularly proud of two things. One is the formulation that allows to approximate functions well; and the fact that they found out how to make it work in a global optimization framework. How did Carl handle the process of working on this problem? It was a long process. The initial inspiration simply came from a paper that was doing local optimization on similar things. He thought that this can easily be incorporated in certifiably optimal rotation averaging frameworks. But then it turned out that it wasn't that easy: “I had to spend a lot of time,” Carl explains, “digging into the equations here and then connecting it to some theory on optimization and convex envelopes, basically. It was a long process and lots of lots of lots of studying optimization.” In his regular work, Carl is in some sense more a mathematician than a computer vision scientist. But his application has always been structure from motion. Everything that he has done sort of relates to 11 DAILY ICCV Wednesday Carl Olsson

12 DAILY ICCV Wednesday to this structure from motion problem: “It used to be kind of the main thing in computer vision, but it isn't anymore”, he exclaims. “So I do have a bit of a different background than most people do.” Carl is quite surprised of being selected as a candidate for the Best Paper award. Of course, he thinks it's a very solid paper. “I don't know,” he tells us. “I think maybe what sets it out is that it's mathematically solid and not that many papers nowadays are that. It's more like most papers nowadays are designing networks for specific tasks without the need for any solid math. I'm not saying that this is a bad thing or anything. I'm just saying that this maybe sets my paper apart a bit.” Carl invites us all to his presentations, to hear about an interesting subject and some traditional old school geometry within computer vision! Oral & Award Candidate

13 DAILY ICCV Wednesday We asked co-author Yaroslava Lochman to tell us more: I feel quite honored that I got to be part of this project. It has been the first time when I looked into global structure from motion/rotation averaging problem in depth. I am probably the most excited about the elegancy of the derivations and the proposed formulation, but also about the improvements it allowed to obtain. At first, they seemed quite minor although rather consistent. I saw high potential in this work and continued testing on other more challenging scenarios, and there the improvements were quite substantial. Seeing that things work as expected, as there's a solid theory behind them, is very exciting! To learn more about Carl and Yara’s work, visit Oral Session 4B: 3D Pose Understanding (Kalakaua Ballroom) this afternoon from 13:00 to 14:15 [Oral 2] and Poster Session 4 (Exhibit Hall I) from 14:30 to 16:30 [Poster 305]. ICCV's sister conference CVPR adopted a motion with a very large majority, condemning in the strongest possible terms the actions of the Russian Federation government in invading the sovereign state of Ukraine. UKRAINE CORNER

14 DAILY ICCV Wednesday Poster Presentation PixTalk: Controlling Photorealistic Image Processing and Editing with Language Marcos Conde is a PhD candidate about to graduate at the University of Würzburg in Germany, under the supervision of Radu Timofte. He is also the first author of a great paper, which was accepted as a poster at ICCV 2025. Ahead of his poster presentation today, Marcos agreed to tell us about his work. Marcos presenting at the ICCV 2025 tutorial: A Tour Through AI-powered Photography and Imaging.

PixTalk tackles a simple but powerful question: how can we make text-based image editing fast and accessible, without the need for complex diffusion models or powerful GPUs? As photographers, what are the core things that you want to do to a photo from the photography point of view? Marcos and co-authors made the list and started to design a neural network able to do all those things, guided by your text instructions. The key technology behind PixTalk is - instead of all the diffusion-based complex models that require very expensive GPUs - a model to tackle these particular photography operations in real time. You can process even on the classical Google Colab GPUs everything up to 24 megapixels, which is more than a 4K resolution. And everything probably will happen in real time, if there are no delays and the GPU is working properly. Amongst these operations, the paper shows, you can control the colors, the illumination, presets, color grading, anything that is important from the photographic point of view and even from the cinematic point of view, like post-production. “The original idea behind this work,” Marcos explains, “was to say OK, we have Adobe. But Adobe is quite difficult to use for the regular user. You have all these sliders, all these buttons and options. What if we can make a neural network that can do all of this and you control it with language, with text? That's all! For this particular set of operations - more than 40 - we are basically like Adobe, but accessible for everyone!” Marcos got the inspiration from Adobe Lightroom, which is the main tool for photography, at least for the professional photographers. Usually, you can edit the white balance of the images, the exposure, the illumination. You can 15 DAILY ICCV Wednesday Marcos Conde

edit the tint, the temperature, color grading. You can change the saturation, the contrast, highlights, apply presets. Like 90 percent of everything you want to do in photography, you can do it with this neural network. Marcos is very much aware of the reasons why this was not done before. “I think in general,” he tells us, “communities focused on the upper bound on the complex models, exploring what is the best possible thing that we can get. But in our lab, we focus on the opposite, on the efficient models. So we start from the bottom and we try to add complexity. The rest of the world is from complex models, trying to distill them, to make them smaller and more efficient. So I guess this was only possible with that kind of mindset. And this is actually the second work in this direction. Our previous work InstructIR, which we presented at ECCV 2024, was also a feature in different media sources and was the initial step. It was the first model that allowed you to restore images using text.” Hence, this was the natural extension of the previous work, and both approaches are actually quite novel. This does not happen without challenges. Designing a very efficient neural network for small language models, diffusion models, is the key. “We don't tackle the problem using complex neural networks,” Marcos declares. “We design them tailor-made for these operations. It took some time, because when you try to have one model that can do all these things without increasing the complexity much, you need to run a lot of experiments and a lot of trial and error. But we finally got it! And we are very happy that at the first try, at ICCV, we got three strong accepts! And that is a very, very good indication of novelty!” Marcos thinks that the fundamental problems in computer vision, at least the ones that we tackle in lowlevel computer vision, are anything related to cameras and computational photography. PixTalk tackles the problem of deblurring. It tackles the problem of denoising, because we want to enhance the photos. But it also tackles well-known 16 DAILY ICCV Wednesday Poster Presentation

problems in photography like white balance correction. It tackles problems from the perspective of color enhancement and advanced techniques for applying different local and global operations to the photos. “Again,” he says enthusiastically, “the key aspect is that we can do all of this while allowing you to run on your computer, because it is extremely efficient!” What makes Marcos particularly proud of, in this work? Let him say it: “That we are the first approach that proves that you don't need diffusion models for photography editing!” The team will also release their code and data set and everything to contribute to the community. Hopefully, we will see these models being able to do very complex photography tasks even under budget constraints, using regular resources. For the next year, Marcos will have something new and he invites us to keep an eye on https://github.com/mv-lab/pixtalk. Marcos is about to graduate this month. He has been working with Radu Timofte since the inception of the lab and he thinks Radu is the best possible advisor that one could choose, as a person and as a mentor. They plan to continue working together also after graduation - on a new startup! To learn more about Marcos’ work, visit Poster Session 4 (Exhibit Hall I) from 14:30 to 16:30 [Poster 420]. 17 DAILY ICCV Wednesday Marcos Conde

18 DAILY ICCV Wednesday Posters Ivan Martinović, PhD Student at the Faculty of Electrical Engineering and Computing University of Zagreb, enthusiastically presenting what it takes to make open-vocabulary segmentation truly open, accepted as a poster at the "What is Next in Multimodal Foundation Models?“ workshop. When the object mutates its frame - Danda Paudel, faculty at INSAIT, Sofia University, presents ObjectRelator, a framework exploring how objects transform across ego- and exo-centric perspectives to bridge first- and third-person visual understanding.

19 DAILY ICCV Wednesday No Comment

20 DAILY ICCV Wednesday Women in Computer Vision “But the thing that brings joy is when you go and present it to other people and you feel that, wow, they find it very practical!”

21 DAILY ICCV Wednesday Elaheh Hatami Elaheh Hatami is a computer graphics research scientist at OpsiClear, a spin-off from Case Western University. Eli, tell me about your work. As a computer graphics research scientist, I am involved in multiple projects, called 3D and 4D scene reconstruction. Now we are prototyping it. One application of that can be robotic surgery in which they have multiple cameras, but they want to see the whole scene. We use a technique called Gaussian splitting, which actually is the quickest way to reconstruct the scene. Maybe you can think of a couple of real-world application where this work might be useful. Sure! The final application is that you can have the 3D digital twin. For example, if you have an adorable dog doll from your childhood and you want to have the 360 view of it forever, we will scan it for you. Recently we have been talking to the Department of Physical Therapy: they watch patients, but sometimes don't have access to all the view. If the doctors can access all the 360 view, everything will be easier. Another fun application: it can embed it to the VR. For example in education, you want to train a student or nurses about the injection. You can mount it on the VR and they can feel like doing it in the real time. Or education in the health care system, like medical students that have to be trained on the bodies. We can have one training session, 4D, and they can use it as many times as they can. It can even go for fun activity. For example, if you want to capture a 4D scene of your wedding. It has many applications! Tell me, is it as fascinating as it sounds? Oh, it always fascinates me because you always find different aspects of that. I'm learning every day a new aspect of what we can do about that and how we can make it even more efficient. But the thing that brings joy is when you go and present it to other people and you feel that, wow, they find it very practical. That's good! That's the part that also fascinates me a lot! Most Iranian scholars start studying in Iran universities and then move to the USA or somewhere else for their PhD. While you are one of the very rare who did all the studies in Iran. I'm glad you're familiar with the system. But yeah, I did my bachelor in computer science, specifically in software engineering and then my master in AI. After that I started my PhD, again in AI and robotics. But I Read 160 FASCINATING interviews with Women in Science Read 160 FASCINATING interviews with Women in Science

22 DAILY ICCV had an opportunity to be a visiting scholar at MIT for a year. That was a great opportunity for me to come and do part of my research at MIT. So after I graduated in Iran, I came back to America as a postdoc. It was your first time in America. Right. I think I came here by the end of January 2017 and it was my first time here. How was the first impact with America and MIT when you came for one year? That was wonderful. I was fascinated by the research and the data and resources. And by the speed of research and collaboration with different labs. That was amazing! At the same time, I learned a lot about the technique of different data, since at that time I was working on the brain data. I was learning more not only about the analysis, but also for the process of data collection and everything in between. It was also a great opportunity for me to go to the conferences here, like the VSS, the Vision Science Society. And another conference in New York about computational neuroscience, which was more about the brain data and also gave me a chance to explore New York. You did not hesitate much after your PhD to come back to America. I really wanted to keep my research going. When I came back to Iran and I was about to graduate, I applied for the postdoc. Unfortunately, COVID happened. And that made me lose a couple of opportunities. Finally, I could manage to come back. Actually, I even came back during the pandemic. It was, I think, in November 2020. Because I applied for the visa before the pandemic and there was some rule that if you stay in some area for two weeks and then come to US, you're safe. So I went to Armenia for actually about a month. And then I moved here. After several years as a postdoc, you moved to industry. What happened? Wednesday Women in Computer Vision

I always wanted to see my research in products. I wanted to still be in research, but I wanted my research goals to become products. That's why I started looking for researchrelated jobs. The startup I work with is a spin up from university. Even my boss and the founders are university professors. That's an easy transition for me. And I was super excited to see that I can do research as I was doing in university with more resources, like in industry. We actually started having some revenue, which is very exciting. Do you ever go back to Iran? No, I can't because my visa is single entry. It's complicated to move. I never go anywhere outside the USA. Wow. We must change that! Hopefully I get my green card soon. Okay. I really hope you do. Tell me, is there anything from the academia world in America or in Iran that you regret? Something that you don't have anymore. When I did my PhD in Iran, I was living in dorms and had more friends. But overall, I don't think I'm missing anything. And actually, I'm very excited for the opportunity! We have spoken about the past and about the present. Where are you going? What is your future? Oh, I'm super excited about the future. I want to be in a research lab in industry. My ideal goal is to have my own team and build meaningful products in industry. I'm paving the way by working on my skills. I am learning a lot to reach that. The short-term goal would be to get as much experience as I can and then gradually move towards my ultimate goal. A difficult question now. What is one skill that you still have to acquire that you will need in the future? It's a great question. From a technical point of view, work more on the leadership technique. That's the part that I am excited to gain experienced in. You have plenty of years in front of 23 DAILY ICCV Wednesday Elaheh Hatami

24 DAILY ICCV Wednesday Women in Computer Vision you to do that. I hope so, but I'm very excited and want things [to happen] very fast. What fascinates you outside science? I love trying all different coffee shops and restaurants. I love long walks. I started doing that in Cleveland, but I really miss Boston. Boston was so walk-friendly: even, you know, by -30 degrees, you can go and walk and see many people outside, which I really love. I still walk. Biking, jogging and trying different foods. All the world around you is made for you to eat and enjoy. Yeah, exactly! Sometimes I even like to go and sit and just watch people. You have studied many years in Iran. Tell us what it is like to study as a student in an Iranian university. There are many pros and cons. For the pros, there is an entrance exam and if you're ranked high, your tuition is covered. They subsidize everything, so I paid almost nothing for most of my education. And even the dorm is included. You can pay less than 10 percent. But there is a huge competition around that. You have to study during the whole high school to get among the best. “Sometimes you rediscover a lot of passion just by redirecting your interest towards something else for some time…”

Xi Yin 25 DAILY ICCV Wednesday Iranian girls are so good in STEM! About 60 percent of university graduates are female. That's what I know. You can double check. When you are in the STEM field, you have higher chances to migrate. That's why probably you see more of them abroad. The STEM field is highly valued. My parents never forced me, but I know that my dad wanted me to be a doctor. But I had blood phobia. I couldn’t even stay in the biology class. I was good in math, so I was like “I think I want to be an engineer!” But I'm glad my brother did that and made him happy! What will you always bring with you of Iran? What stays with you forever? The poet that I love, Hafiz. Yeah, I have a book with me. Digital versions of photos from my childhood and my parents. I also brought a couple of special things like clothes or gifts. For some people it might not make sense, but for me I just want to keep them! Your message to the community. I am very grateful for this community. I just want to say that I hope we could have more opportunity to be involved more in the community. I'm sure there must be some way but I would like it to be easier and more affordable. Read 160 FASCINATING interviews with Women in Computer Vision! Read 160 FASCINATING interviews with Women in Computer Vision! “I love long walks. I started doing that in Cleveland, but I really miss Boston. Boston was so walk-friendly: even, you know, by -30 degrees, you can go and walk and see many people outside, which I really love!” Elaheh Hatami

Made with FlippingBook

RkJQdWJsaXNoZXIy NTc3NzU=