A publication by Winter Conference on Applications of Computer Vision Sunday 2024 WACV
2 DAILY WACV Sunday The inaugural Workshop on 3D Geometry Generation for Scientific Computing brings together diverse minds from AI and scientific communities to explore the state of the art in 3D geometry generation and how it can be applied to open problems in science. Originally from a computational physics background, Marissa, the driving force behind the event, ended up in computer vision after recognizing a significant common need for accurate 3D geometries of the world. “I realized that my community in physics was about 20 years behind the state of the art in computer vision,” she recalls. “I turned myself into a computer vision researcher to bring the latest technology there, but in that journey, I found that researchers were doing essentially the same thing throughout different pockets of science. Whether it’s a tree for a forest, a glacier, the bottom of the ocean, or a black hole, we all need 3D geometry to do our interesting science.” The motivation to organize a workshop stems from a lack of a central platform for researchers pursuing 3D reconstruction across domains. A reliance on word of mouth to connect like-minded individuals leads to missed opportunities, so this event offers a solution, establishing a space where people from different fields can meet, exchange ideas, and stay abreast of the latest advancements in computer vision. Workshop Marissa Ramirez de Chanlatte is a PhD candidate at UC Berkeley, working with Trevor Darrell in the Berkeley AI Research Lab and Phil Colella at the Lawrence Berkeley National Laboratory. As the lead organizer of this afternoon’s interdisciplinary workshop, she gives us an insight into what the event has in store.
Scientists, including Derek Young, a forest ecologist from UC Davis, and Emma “Mickey” MacKie, a glaciologist from the University of Florida, have been invited to share their groundbreaking work. The agenda emphasizes a two-way dialogue, encouraging the 3D reconstruction community to disseminate their tools and insights into the scientific community while inviting applied scientists to articulate their specific challenges and applications. “We’re going to all come together and understand best practices,” Marissa tells us. “Also, maybe motivate some research for the people in 3D reconstruction. Maybe a new way to think about problems. 3 DAILY WACV Sunday 3D Geometry Generation for Scientific Computing Workflow for programmatically locating individual detected trees within raw drone images for computer vision-based species classification. (1) Tree is detected using CHM, orthomosaic, and/or point cloud. (2-3) Tree location is projected from 3D space onto the drone image. (4) The process is carried out for every drone image in which the focal tree appears, enabling multi-view species classification. - From "Aerial imagery, photogrammetry, and multi-view computer vision for forest mapping at the individual tree scale” by Derek Young, UC Davis.
4 DAILY WACV Sunday There are some interesting things about these very large scientific datasets that are different from the standard benchmarks you see in every 3D reconstruction paper. I’m really excited to talk about that.” The choice of WACV as the workshop’s venue was part timing and part strategic. Despite the potential for a larger audience at conferences like CVPR, the more intimate setting of WACV aligns with the event’s focus on community building. Additionally, the allure of the location helped. “It’s easy to convince new people to come to Hawaii!” she laughs. “This is my first WACV, but I’ve heard so many positive things about the community from my colleagues. I would love to have the workshop at every computer vision conference one day, so I’m hoping this is the first of many.” As delegates contemplate competing workshops, the appeal of this one lies in its potential impact on addressing some of the most pressing global issues, such as climate change. 3D geometries of the forest can help mitigate an increase in wildfires, and work being done in glaciology is helping to model what will happen as ice caps melt. “I think that’s all of our dreams when we create these technologies, that they’ll be used for something good and impactful,” Marissa points out. “The easiest way for that to happen is to sit in a room with the scientists doing that work and pitch your tools to them, listen to them, and figure out new research directions that really matter. Our community tends to interface less with these big scientific problems than other problems, such as VR or autonomous driving, which are closer to the vision community.” Ultimately, Marissa expects participants will have different takeaways from the day. Those on the tool and 3D reconstruction side will likely gain fresh insights into new challenge problems, new potential datasets or benchmarks, and new ways to consider how the technology can be used. For those on the application side, they will discover new tools that can enhance their work and make their lives easier. “I’m just hoping to have all these people in one room for the first time and build those connections and see the similarities,” she says. Workshop
5 DAILY WACV Sunday “Reconstructing a glacier and a forest are not really that different! I think we can all learn from each other.” The workshop’s call for papers welcomed novel and previously published work and works in progress. There will be no Best Paper award, which Marissa stresses is in line with its focus on community building and encouraging an inclusive, collaborative, and non-competitive atmosphere. Having seen all the papers, were there any surprises in the mix? A diagram illustrating NeuroFluid a differentiable two-stage network consisting of (i) a particle-driven neural renderer, which involves fluid physical properties in the volume rendering function, and (ii) a particle transition model optimized to reduce the differences between the rendered and the observed images. NeuroFluid provides the first solution to unsupervised learning of particle-based fluid dynamics by training these two models jointly. It is shown to reasonably estimate the underlying physics of fluids with different initial shapes, viscosity, and densities. - From "NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields" by Shanyan Guan, Huayu Deng, Yunbo Wang, and Xiaokang Yang 3D Geometry Generation for Scientific Computing
“I wasn’t sure how much space there was for generation, but I’ve seen a lot of really clever ways of using 3D generation to solve scientific problems,” she tells us. “I had always thought of 3D reconstruction for science as faithfully reconstructing something that already exists, but in one paper, they were generating pieces for an aircraft and trying out different things. Being able to generate very fast allowed them to solve that more quickly. I’m excited to explore applying this generative technique to other areas!” Marissa also spotted a trend for more physics-informed reconstruction. “That’s one of the interesting things about scientific applications,” she continues. “When you know the application, sometimes you can cheat a little. You have some extra priors and information that can improve your reconstruction. I love those papers as well.” Time has been reserved toward the end of the workshop for everyone to engage in open-ended discussion on making tools more useful for scientists, strategies for sharing them more effectively, and identifying some common research goals. The workshop perfectly encapsulates Marissa’s thesis work, which spans 3D reconstruction for science. “We have a reconstruction to physics pipeline,” she explains. “I work on a large-scale PDE solver at Lawrence Berkeley National Lab and also work on some reconstruction methods. The idea is, could you go from images to physics all in one go?” Through her conversations with scientists, Marissa has discovered why many are not embracing some of the newer methods in 3D reconstruction. It often boils down to one thing: trust. “That led me down a work of looking into uncertainty in 3D reconstruction,” she reveals. “That’s also something I focus on. How can we communicate the accuracy of these models to encourage more scientists to use them?” To learn more about this topic, you are all invited to attend the Workshop on 3D Geometry Generation for Scientific Computing this afternoon at 13:0017:00 (Naupaka VII). You can also head to invited speaker Emma “Mickey” MacKie’s webpage for a preview on what she will be talking about. 6 DAILY WACV Sunday Workshop
7 DAILY WACV The Ultimate WACV Hero… ☺
8 DAILY WACV Sunday Tutorial The nuts and bolts of Uncertainty Quantification Gianni Franchi is an assistant professor at ENSTA Paris. He speaks to us ahead of his fascinating interactive tutorial about the emerging field of uncertainty quantification in deep learning.
9 DAILY WACV Sunday Uncertainty Quantification Deep learning techniques have become increasingly popular in recent years, demonstrating remarkable results across various domains. Until recently, the only question that tended to be asked of them was: Can we improve their performance? However, a new critical question has emerged with the rapidly evolving landscape of deep neural networks: Are they reliable? “It turns out they’re not totally reliable,” Gianni tells us. “They have good results, but sometimes they have wrong results, and we don’t know why. It’s important to know if we can trust them. A colleague of mine once tried ChatGPT. He asked, ‘Can we buy – and he invented a word – in a supermarket?’ ChatGPT says, ‘Yes, of course, we can buy it!’ It can hallucinate a new word, but it’s a fake word. With a reliable confidence score, we could say, I don’t know this word. In some sense, we could know if we can trust ChatGPT or whatever.” The community’s focus has traditionally been on accuracy and improving the performance of neural networks through increased size, architectural changes, and larger datasets. Reliability has not previously been a prominent concern, with very few papers on the topic and the absence of a dedicated track at most major computer vision conferences. Gianni aims to rectify this in his hands-on tutorial by providing a comprehensive introduction to uncertainty theory and its practical applications. “During this tutorial, we’ll introduce all the basics,” he reveals. “Also, with Google Collab
and Jupyter Notebooks, you’ll go through the code with us and try to implement it. We’ve tried to take it from the theory to the application with the code all in one. Also, we’ve developed our library, TorchUncertainty, to work with uncertainty and PyTorch. We’ll try to get everyone implementing things with that.” Despite a limited focus on uncertainty in general, successful workshops have already taken place at ECCV 2022 in Tel Aviv and ICCV 2023 in Paris, and thanks to a growing interest and recognition in the community, organizers plan to resubmit the workshop this year. “We were in the major room of ICCV, and many people were interested,” Gianni recalls. “I think it’s the beginning of the field. The main thing is that more and more people work on uncertainty. I’ve talked with people organizing an upcoming big conference on computer vision. I won’t say the name, but they told me that uncertainty for them is still not a major part of it. For me, it’s still not, but it will be in the future.” Looking toward the future, he hopes to encourage more researchers to explore and contribute to the understanding of uncertainty in deep learning. He is optimistic that one day soon, uncertainty will look more like the field of explainable AI, which he sees as an interesting and equally important topic, but one that is gaining traction and already has a lot of papers. 10 DAILY WACV Sunday Tutorial
Gianni’s day job away from the tutorial similarly focuses on uncertainty and deep neural networks. He completed his PhD at MINES ParisTech, in a lab founded by Georges Matheron, a French mathematician instrumental in developing kriging. Based on Gaussian processes, this technique serves as a method of interpolation and uncertainty. “From the beginning of my PhD and even my master’s thesis, I’ve been interested in the limitations of what we do, of machine learning and now deep neural networks,” he says. “It’s always interesting to see all the positives, but I think it’s also interesting to see the limitations.” If this article piques your interest, visit Gianni’s tutorial tomorrow morning from 9:00-12:30 (Naupaka VII). Note: Please bring your laptop. 11 DAILY WACV Sunday Editor Ralph Anzarouth and RSIP Vision wish to thank Nicole and all the organizers of this wonderful WACV 2024 conference. You guys rock! Uncertainty Quantification
12 DAILY WACV Sunday Fireside Chat “I don't think we have solved Computer Vision…” “I am not an ivory tower professor, where I invent a problem, solve it, and ask if anybody cares…” “These people [at Amazon] are even crazier than I thought: they are willing to give me the keys to the kingdom…” “I think that if you try to compete with labs that have tens or hundreds of people, working on [...] creating the next LLM that's going to require billions or trillions of data, is not a good idea for an academic...” “As a grad student, it is always good to work at two things at a time. Because when you hit a wall, the wall is not gonna go away. But if you go away from the wall, for a little bit, you can get around it...”
13 DAILY WACV Sunday Gérard Medioni
Rita Cucchiara is a professor in Computer Vision at the University of Modena and Reggio Emilia, where she is also the Director of the Artificial Intelligence Research and Innovation Center (AIRI). Rita, tell us more about what you do. For me, it’s more important that for about 20 years, I’ve been supervising a group of researchers called AImageLab in the “Enzo Ferrari” Department of Engineering, where we are now 53 people – seven professors, researchers, PhD students, one startup, and so on. In particular, I’m enjoying a lot that we’re researching computer vision. Most of our readers probably know that you are an institution in our community. Of all your many roles, which one is the dearest to you? I’ll tell you two other roles that I have: one national and one international. The national one is that I’m now writing the Italian Strategy on AI. I did the same with the past government, Draghi, and I did the same also with the other government. I’m contributing together Women in Computer Vision 14 DAILY WACV Sunday Rita posing with Ralph at ICCV 2017 in Venezia Lido, where she was Program Chair
with a lot of colleagues. I believe this is important – not for me, but for the next generation because this technology is so important in the world. It should be important in Italy, and now the investment is also coming in Italy, and this is, I believe, one of the things I’m more proud of. After we wrote the strategy two years ago, we now have an enormous project for Italy that is more than €100m, called Future AI Research (FAIR). That takes together most of the universities in Italy. It’s paid with the Next Generation Europe money, so it’s not bad. It’s only for foundational AI research. The other role is international because I think that to do something good in Italy, I need to know what everyone is doing around the world. Modena is now one of the units of the European Laboratory for Learning and Intelligent Systems (ELLIS). That is an association of more than 40 units in Europe and hundreds of people. Also, I’m working a lot in the computer vision community. This year, I’m happy because I will be the General Chair of CVPR. I was General Chair of ECCV 2022 in Tel Aviv and Program Chair of ICCV in Venice in 2017. Our magazine is very proud to have accompanied you in all these roles. It’s true, and thanks a lot for that. Tell us more about the work you are doing. The secret of my work is that I don’t do anything! [she laughs] No, no, I like delegating to people what they would like to do. I ask the students and the researchers with me to read a lot, to know a lot, and to have the creativity to invent something new. This is for research, but in my university, we also do a lot of applied research, especially because in Italy, we don’t have a lot of government funds. Most of the money we have comes from companies: mechanical companies that we have around, technological companies, international ones. I need people working with me to think about new research ideas and how to apply them in the practical world. I study a lot. I try going around this conference to pick up some new ideas. Are conferences a place where you pick up ideas? I’m working mainly in computer vision. Even now, computer vision is a part of the big game of AI. In the last few years, I’ve been working more on language and vision together, and also language and generative AI. I think the communities that try to do all things together are very interesting. For my tradition, I come to CVPR, ICCV, ECCV, WACV, and so on. Also, the old ICPR, for me, is my story, so this is important. But also the other conferences, like NeurIPS or AAAI, are important because now everything is mixed in our technologies. 15 DAILY WACV Sunday Rita Cucchiara
You briefly mentioned delegation. How can you be brave enough to delegate to others and trust that they will do things right? If you don’t, you’ll kill yourself. We have only 24 hours in one day. In general, I believe that I make a lot of mistakes, so I accept my mistakes and I accept the mistakes of others. Who taught you this? I don’t know. [laughs] I have no one to cite, but the idea is that we must believe that others can do better than us. I don’t know if it’s just in Italy, but in general, people say that professors don’t want to have students better than themselves, so they try to find people more stupid and more stupid and more stupid until one person arrives who is not so stupid, but you don’t understand is not stupid, and will become better than you. I don’t want to do that. I try to find people that I don’t know if they are fantastic or stupid, but I would like them to try to be better than me. As a professor, my role is to make them as best as possible. For this reason, I have to trust in them. I have to trust the new PhD students and the full professors who are working with me in the same manner. You have so many strings to your bow. Where do you get all your energy from? Because I like my job, this is the reason. Is that the main secret? Yeah, I think so. If you like your job and you don’t want to stop learning, these are the two secrets. What do you like about your job specifically? I prefer to have new ideas. This is really very difficult. I’ll tell you another thing that my students say: in general, when I arrive in a meeting, I start to tell a million different things. They know that among them, one idea is good! [she laughs] The others I don’t want to know. What I like in my job is to try to find the next step to reach and the new problem that is still open. In computer vision and AI, there are so many open problems. I’ve been working in tracking and human analysis for 20 years or more, and it’s still not a solved problem. 20 years ago, we weren’t able to find one person in an empty space or follow them, even if they were walking in a straight situation with a Kalman filter. Now, we can do very complex systems, like recognizing people in a crowd, tracking them, or recognizing 3D, even looking at only one image. There are many, many things that 20 years ago were completely unbelievable, but I’m sure there is still a lot of work to do on this topic and many others. Of all these ideas you have, do you prioritize yourself, or do you have someone who prioritizes for you? Women in Computer Vision 16 DAILY WACV Sunday
17 DAILY WACV Sunday [Rita laughs] I have no order, neither in my life nor my ideas. In general, often they’re goal-oriented because, as I told you, we have many different projects paid by problem, by companies, or paid by the European Community and so on. You have a lot of ideas, and then you have to put these ideas into a specific problem. I’ll give another example. We’ve started an interesting European project called the European Lighthouse of AI for Sustainability (ELIAS). That is coordinated by Italy, by Nicu Sebe in Trento. I have the role to coordinate the individual sustainability part because this is a project about environmental, societal, and individual sustainability. What does it mean? Privacypreserving AI, personalization. What I like a lot at the moment is unlearning, so understanding there are some things that you can forget, not only to learn. Unlearning is a new topic in computer vision. It’s starting now. The idea is if you have a pre-trained network that does something, what you can do if it trains with something that you don’t want it to remember because of copyright, because of privacy, but also because it’s becoming obsolete. Or it might be wrong. Yeah, or it might be wrong. Stupid example: I have a robot that stays in my house, and I ask it, ‘Give me my ball.’ I play tennis, so my idea of a ball is a tennis ball. I don’t play basketball anymore, so I don’t want that. He remembers that this is a ball for me. The idea is to modify the knowledge - not only to remember everything, that is too easy, but just to know what is needed is something that humans are doing. Now, much research is devoted to this kind of personalization. That means understanding just only what is necessary. I’ll give you another example. Another European project I’m working with is called ELSA. All of them are from ELLIS, so it’s starting with a similar name. ELSA is the European Lighthouse on Secure and Safe AI. In both of them, we are looking at security. First, detection, and also what we are doing now, what we call the Safe CLIP, so understanding if a system like CLIP, for instance, learned from violence, abuse, or, in general, unacceptable concepts, both in images and text. There are a lot of inadequate or, I say, toxic words and toxic images in what they learn. What we are doing now is trying to retrain the system for unlearning the violence. If you generate an image with a woman raped by a man, no, I would like that you generate a woman that is talking with a man, so understanding the concept, maintaining the theme, but forgetting the violence. These are some new experiments we are doing that, for me, are really very interesting. What we can do now in computer vision and in AI, so language, understanding the world, Rita Cucchiara
Women in Computer Vision 18 DAILY WACV Sunday understand how to modify the digital world and the real one. For the next 20 years – not myself because I’m old, but my students will have a lot of work to do to find new challenges and try to solve them. Our readers know that we are both Italian, but they probably do not know that we are from the same province: Modena. How can Italy solve its structural and historical problems and be successful in AI? Yeah, this is a problem, actually. We ask each other what we can do in Italy that cannot be done or at least that can be done in the same manner or in our manner with respect to the US, China, and the rest of Europe. We have to remember that Italy, also provinces like Modena, is very rich, and we have some excellence. I don’t want to say Ferrari; that is too easy. Maserati. Also, Maserati, but we have the fashion companies and luxury companies. We have food. We have food companies. The best restaurant in the world is in my city. We cannot solve enormous problems for all. We would like to work to improve life in small steps, one after the other. Point after point after point. Like in tennis! Like in tennis is true. I think that since we have some specificity. The cultural heritage, for instance, that is typical not only of Italy but is typical of Italy. Luxury products. The design. What we are doing, for instance, now in generative design, using computer vision and generative AI to help designers do this work better. We are doing it for ceramic tiles, and we are doing it for fashion because sometimes I think that we’re not interested only in completely autonomous systems, but I believe that our role is to do something that is better for humans and for our work. If you want to do autonomous driving, it’s okay, but I’m so happy that the cars are working better for parking instead of me, but give me the enjoyable situation to drive when it’s necessary to do that. It took me 8 years to convince you to do this interview. Tell everyone that it’s not too terrible! No, absolutely. You’re a fantastic person, and I enjoy speaking with you a lot, even if not in an interview. But because I like to delegate, I told you I would prefer you interview my young person! You’re adorable. Next time, I will. Do you have a final word for the community? Our community is probably becoming too big, and our conferences are becoming too big, so there is a risk of losing themselves in too many things. Think about what you like and what you know that you can do, and you have not got to be an expert in everything. Do expert a lot in one thing. Work for 20 years in the same things!
19 DAILY WACV Sunday My First WACV ☺ Jialin Yuan (top), a PhD Student from Oregon State University, presenting her work about detection of harmful multimodal content from the asymmetric angle in vision content and language content. Neel Dey (bottom), a Postdoctoral Researcher @ MIT CSAIL in Polina Golland’s Medical Vision Group, presenting his work that presents a synthetically-trained segmentor for 3D irregular blob-like shapes in *any* new and unseen radiology and microscopy dataset without needing any retraining, interaction, or adaptation.
Double-DIP Don’t miss the BEST OF WACV 2023 in Computer Vision News of February. Subscribe for free and get it in your mailbox! Click here
21 DAILY WACV Sunday Orals… ☺ Jad Abou-Chakra (top), presenting his oral work “ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields”. This work won the Best Paper Honorable Mention at WACV 2024. Shristi Das Biswas (bottom), presenting her oral work “HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities”.
Mathias Unberath, an assistant professor of computer science at the Whiting School of Engineering at Johns Hopkins University - with secondary appointments in the School of Medicine - is one of 10 members of the JHU community who were awarded Career Impact Awards in November. The awards recognize individuals who have provided outstanding contributions to the professional development of their students and trainees. Mathias had also a poster at WACV 2024 in the session of Friday evening - RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-Centric Learning. 22 DAILY WACV Award ☺
Did you read Computer Vision News of December? Read it here
24 DAILY WACV Best Papers
25 DAILY WACV Sunday Good Bye WACV 2024
RkJQdWJsaXNoZXIy NTc3NzU=