Computer Vision News - April 2024

A publication by JULY 2023 April 2024 Woman in Science Elisa Roccia p.22 Best of SPIE Medical Imaging p.34

Computer Vision News 2 CVPR Workshop Radu Timofte is a Professor of Computer Science and a Humboldt Professor for AI and Computer Vision, leading the Computer Vision Laboratory at the University of Würzburg. He is here to tell us about NTIRE2024, the ninth edition of an exciting image restoration and enhancement workshop he co-organizes at CVPR 2024. The New Trends in Image Restoration and Enhancement (NTIRE) workshop is a meeting place for professionals working with image and video, focusing primarily on low-level and mid-level vision. “We have to remind ourselves that computer vision itself is not possible without light and image sensors,” Radu begins. “We’re working on the side of the processing pipeline at the heart of computer vision algorithms.” The workshop’s core themes revolve around image restoration, enhancement, and manipulation, aiming to bring more detail to images and videos by restoring degraded content, filling in missing information, and transforming them to achieve desired targets in terms of quality and performance. The event spotlights new trends and concepts, such as super-resolution, and creates a platform for academic and industrial participants to engage New Trends in Image Restoration and Enhancement

3 Computer Vision News engage and collaborate. “NTIRE targets quite a large spectrum of users because everybody is working with image and video,” Radu points out. “Our workshop stays closer to topics related to low-level vision, but even researchers who work on mid-level or high-level vision would enjoy seeing what happens after the processing steps are conducted to enhance the quality of an image or video and bring it to a level of usefulness.” NTIRE’s journey began in Taipei at ACCV 2016, but it has been a core part of CVPR since 2017, with notable professors like Luc Van Gool from ETH Zurich being part of its organization from the start. Diverse teams from different universities and the industry have been coming together annually ever since to tackle new and challenging topics and problems and compete in the workshop’s associated challenges. This year, 17 challenges have been organized, each addressing different facets of low-level vision, restoration, and enhancement, attracting hundreds of solutions that gauge the state-of-the-art. “It’s been a fruitful and interesting journey,” Radu recalls. “I’m very grateful to my co-organizers and PC members. This year, we have around 30 organizers and 80 PC members, and they all have special duties, either in charge of the challenges or the reviewing process or just bringing interested industry or distinguished speakers to the workshop.” The team plans to invite five illustrious speakers, hoping to inspire new research directions, although their identities and topics will be confirmed in due course. Reflecting on NTIRE’s significance within the CVPR landscape, Radu highlights its history as one of the most extensive workshops in terms of participation and paper presentations. Each person attending will find something that aligns with their interests or curiosities. Participants can expect a NTIRE2024

Computer Vision News 4 CVPR Workshop

5 Computer Vision News vibrant community eager to interact and contribute to ongoing advancements in the field. Regarding what differentiates NTIRE from other workshops in this domain, Radu says it goes more into the details of the core problems in restoration and enhancement. However, there is more cross-pollination than division. “When we look at the research, it boils down to components, and those components are typically shared,” he explains. “It’s a very good thing that we’re sharing the components to reach some new combinations and ideas across fields.” In addition to NTIRE, his team is coorganizing two other workshops at CVPR 2024 in June. AIS: Vision, Graphics, and AI for Streaming will be the first workshop on AI for Streaming at CVPR, taking restoration and enhancement problems into the realm of realtime, optimized algorithms for streaming. Meanwhile, the fourth Mobile AI workshop will address specific restoration and enhancement challenges for mobile devices like smartphones, where algorithm deployment faces particular time and memory constraints in pursuing the best user experience. Radu’s lab’s day-to-day work overlaps with the workshops, focusing on topics typically seen as low-level vision and augmented reality (AR) and tackling the underlying challenges of machine learning and AI that power these algorithms in the background. “We’re handling restoration and enhancement and manipulation of image and video, seeing them in the context of typical streaming scenarios like social media, AR and communications,” he adds. “We’re very interested in placing these kinds of algorithms on mobile devices that are used everywhere.” NTIRE2024

Computer Vision News 6 by Madi Babaiasl, Bryan MacGavin, Daniel Montes Tolon, Namrata Roy 1 Introduction In part 1 of this series of lessons, you learned how to set up your hardware and software to make everything ready to implement the vision-aided inverse kinematics control of the robot arm. In this lesson, you will learn about some theory behind screw theory-based numerical inverse kinematics using Newton-Raphson iterative method for robotic manipulators and thus you will be able to calculate the necessary parameters and equations to implement your robotic arm’s numerical inverse kinematics. The equations for the robotic arm used in these series of lessons are provided for your reference. Lessons in Robotics Do you remember awesome Madi Babaiasl, who two years ago promptly taught me about redundancy in Robotics? A really fascinating moment! We asked her to prepare a full lesson for our readers. What she did with three of her students (they are all PhD students and one of them is a soon-to-be professor) is a great "educational" lesson – so complete that we had to divide it in 3 parts. Part 1 was published in the March issue of Computer Vision News. Here is part 2. Part 3 will be published in our May issue. You will find it in our archives during and after May 2024 Vision-aided Screw Theory-based Inverse Kinematics Control of a Robot Arm Using Robot Operating System (ROS2) - Part 2

2 Screw Theory-based Numerical Inverse Kinematics Using NewtonRaphson Iterative Method for Robotic Manipulators Inverse kinematics of robotic arms is the problem of finding joint variables that produce a desired end-effector configuration. Formally, the inverse kinematics problem can be stated as follows: Given an degree of freedom open chain, with forward kinematics represented by the variable homogeneous transformation , where are the joint variables, it is required to find a solution that satisfies , where is a desired (and given) homogeneous transformation and (IK: given , find such that ). Fig. 1 shows the difference between forward kinematics and inverse kinematics. Figure 1: Forward Kinematics vs. Inverse Kinematics in Robotics Inverse kinematics is a very important problem in the sense that here, we want to control the end-effector’s configuration for it to be able to interact with the world. However, it is a more complicated problem to solve than the forward kinematics. In forward kinematics, we will have a unique endeffector configuration for a given set of joint values, but the inverse kinematics problem can have zero, one, or multiple solutions for the joint values q that can produce the given desired end-effector configuration. There are two approaches to solving the inverse kinematics of an openchain robot: ➢ Analytic approach to inverse kinematics in which closed-form solutions for the joint variables can be found. Here, geometric approaches to the problem are utilized. These analytic solutions may not always exist. 7 Computer Vision News Vision-aided Screw Theory-based…

Computer Vision News 8 ➢ Iterative numerical approach to inverse kinematics in which Jacobian is used to iteratively find a solution using the Newton-Raphson method. An initial guess for the solution should be made, and then this method iteratively pushes the initial guess towards a solution. This approach will only give us one solution and not all possible solutions. If the inverse kinematics equations cannot be solved analytically, iterative numerical techniques may be utilized. Moreover, numerical methods are frequently employed to enhance the precision of analytic solutions, even when such solutions are available. In this case, the analytic solution can be used as an initial guess for the iterative numerical approach. For this purpose, we will use Newton–Raphson method, which is an integral method in nonlinear root finding. In situations where there is no exact solution exists, we will need optimization methods to find the closest approximate solution. If the manipulator is redundant, there will be infinite solutions to the inverse kinematics problem. In this case, we need to find a solution that is optimal with respect to some criterion. Now, let’s start by providing a general algorithm for numerical inverse kinematics using the Newton-Raphson method. Suppose that is the pose of the end-effector frame {b} in the base frame {s} (calculated from the forward kinematics) and the desired end-effector configuration is given by the transformation matrix as depicted in Fig. 2 below. Solving the inverse kinematics means that we need to find the set of joint angles that can Lessons in Robotics Figure 2: llustration of the pose of the end-effector frame in the base frame, , and the desired end-effector configuration is given by the transformation matrix . Solving the inverse kinematics means that we need to find the set of joint angles that can take the end-effector frame {b} to the desired frame {d}.

take the end-effector frame {b} to the desired frame {d}. To learn about reference frame assignment in robotics and the homogeneous transformation matrices to represent pose in robotics, you refer to the orientation in robotics and pose in robotics lessons. To have a brief review of the Newton-Raphson method for nonlinear root finding, you can refer to this lesson. To find the numerical screw theory-based inverse kinematics of open-chain manipulators, follow the following algorithm: 1. Initialization: Given (transformation of the desired frame with respect to space frame) and an initial guess of joint variables , set 2. Set . While the algorithm is not converged: ➢ Set ➢ Increment Where is the current joint angles guess, is the body twist calculated from the error between the current end-effector configuration and the desired configuration, and is the pseudoinverse of the body Jacobian matrix evaluated at the current joint angles . To learn about how to calculate the forward kinematics of robot manipulators, refer to this lesson, for Twists refer to this lesson, and to learn about screw theory based Jacobian matrix calculation, refer to this lesson. Figure 3: Robot arm depicted in home position with screw axes and link 9 Computer Vision News Vision-aided Screw Theory-based…

Computer Vision News 10 lengths assignments that are essential to find the forward kinematics. The lengths depicted on the right photo are: L1 = 0.08945, L2 = 0.1, Lm = 0.035, L3 = 0.1, and L4 = 0.08605. For our robot arm that we use for this lesson, the screw axis of each joint expressed in the base frame in the robot’s zero position (see Fig. 3 for link lengths and screw axes assignments) are as follows: and thus, can be written as the following equation using the Product of Exponentials PoE formula: Where, q1, q2, q3, and q4 are joint angles, and is the end-effector configuration when the robot is at its zero position and for our robot arm it is represented by the following matrix: The Space Jacobian of this robot can be expressed as: Where and can be calculated using the following equations: Lessons in Robotics

in which, is an expression for the screw axis describing the ith joint axis in terms of the fixed frame with the robot in its zero position given above, and is the screw axis describing the ith joint axis, but after it undergoes the rigid body displacement instead of being at zero position. In other words, it is the Adjoint map of the screw axis for when the robot is no longer in zero position. Note that if is a transformation matrix where R is the rotation matrix and p is the position vector, then its adjoint representation can be calculated by: Note that the bracket notation [p] is the 3 × 3 skew-symmetric matrix representation of the position vector p. To learn more about the skewsymmetric matrix representation of a vector, you can refer to this lesson. The body Jacobian can then be calculated from the space Jacobian using the adjoint transformation (to learn more about the adjoint transformation in robotics, refer to this lesson): where is the inverse of the homogeneous matrix that was derived from forward kinematics earlier. 3 Summary Part 2 of the lesson series explored the theory behind screw theory-based numerical inverse kinematics for robotic manipulators, focusing on the Newton-Raphson iterative method. It also aimed to equip learners with the ability to calculate parameters and equations necessary for implementing a robot arm’s numerical inverse kinematics. In the next part, we will start working on implementing the vision-aided numerical inverse kinematics control of the robot arm. 11 Computer Vision News Vision-aided Screw Theory-based… Computer Vision News is very grateful to Madi and her team for another awesome lesson in robotics!

Computer Vision News 12 Lessons in Robotics References 1. Madi Babaiasl. Modern Robotics Course. 2. Peter Corke. Robotics, Vision and Control: Fundamental Algorithms in Python. Vol. 146. Springer Nature, 2023. 3. William Hoff. Intro to Computer Vision. 4. Peiyuan Jiang et al. “A Review of Yolo algorithm developments”. In: Procedia Computer Science 199 (2022), pp. 1066–1073. 5. Kevin M Lynch and Frank C Park. Modern robotics. Cambridge University Press, 2017. 6. Richard M Murray, Zexiang Li, and S Shankar Sastry. A mathematical introduction to robotic manipulation. CRC press, 2017. 7. OpenCV. OpenCV-Python Tutorials. 8. Richard Szeliski. Computer vision: algorithms and applications. Springer Nature, 2022. 9. Trossen Robotics.

13 Computer Vision News The quote of the month - Yann LeCun

Computer Vision News 14 Congrats, Doctor Daniele! The recent progress in Computer Vision is driven by data growth, computational advancements, and a scientific focus on innovative but increasingly complex AI algorithms like Large Vision Models. These algorithms require huge computational power, incurring high costs and energy usage, which raise environmental issues and restrict technology access and affordability. Edge AI addresses these issues by providing decentralized AI solutions suitable for low-cost edge devices, despite challenges in developing AI models balancing execution speed and accuracy on low-power devices. Daniele Berardini recently completed his PhD with the Vision Robotic and Artificial Intelligence (VRAI) group at Università Politecnica delle Marche, receiving his degree cum laude. His thesis focused on developing Edge AI-enabled monitoring systems. He is now a Postdoctoral Researcher, still at UNIVPM. Congrats, Doctor Daniele!

15 Daniele Berardini Computer Vision News My thesis aims to contribute to the integration of Edge AI in Computer Vision by focusing on the design and development of lightweight, deep learning-based monitoring systems for the real-time analysis of images and videos. It explores two domains, each with its own challenges: security, focusing on weapon detection in surveillance videos, and healthcare, focusing on segmenting preterm infants’ limb poses from depth data. Regarding the security domain, in collaboration with INIM Electronics, an Italian leader in security systems, I tackled weapon detection challenges, notably the difficulty in identifying small-sized weapons and the need for real-time weapon localization. Current solutions, like image Super Resolution (SR) methods or complex detection architectures, are inapplicable on edge devices due to computational constraints. To address this, after the creation of a surveillance dataset for weapon detection (WeaponSense), I proposed the first Edge AI framework for real-time weapon detection in surveillance videos through the use of two cascaded CNNs, optimized for edge devices. Despite the results improving the state of the art, the framework shows limitations on efficiency in crowded environments. Thus, in collaboration with the University of Córdoba (Spain), I proposed a novel method that integrates during training an Enhanced Deep Super Resolution network into an edge-oriented CNN for weapon detection, discarding the former during inference. The proposed approach overcomes the previous limitations, enabling accurate and real-time on-device localization of weapons (Figure 1). In the medical domain, my research was motivated by the need for automated technology to continuously monitor the movement of preterm infants, which is essential for early assessment of potential long-term complications. Current methods are effective but require very high operational costs, hindering their implementation in budgetlimited facilities. Driven by these premises, I initially proposed a CNN that incorporates lightweight computational blocks from a segmentation network (EDANet) into the bi-branch structure of a preterm infants’ pose segmentation network (BabyPoseNet). Subsequently, by conducting a per-layer complexity analysis of the proposed CNN, I redistributed computation throughout the network to reduce complexity. This approach yielded a real-time framework capable of running on edge devices, achieving optimal accuracy in segmenting infants’ limbs. Ultimately, my thesis aims to push research toward more sustainable and affordable AI solutions in different domains, valuing not only high accuracy, but also the efficiency and adaptability across various needs and contexts.

Computer Vision News 16 Robotic Drone Contest Simone Mentasti, (standing left) a researcher at Politecnico di Milano, has recently gained recognition as part of the winning AIRLab POLIMI team at the Robotic Drone Contest during the European Robotics Forum (ERF) held in Rimini last month. He speaks to us about the competition and the innovative drone robotics that helped his team (with Matteo Molinari and Mattia Giurato) scoop the top prize.

17 Computer Vision News AIRLab Politecnico di Milano The Robotic Drone Contest is a competition supported by Leonardo, an Italian aerospace, defense, and security company. Initially involving six universities in Italy, it has recently expanded its reach to include institutions from other countries. Simone tells us that for the first three years, the contest focused on participants building aerial drones and developing algorithms for navigation in new environments. Last year, this focus expanded to multi-agent cooperation. “In this scenario, we still have drones, but we also have a ground robot,” he says. “The goal is to develop algorithms that allow cooperation between these agents to explore a partially unknown environment and identify and track targets of interest.” Winning the contest requires a strategic approach. Participants go through multiple rounds, aiming to locate specific targets within the environment accurately and efficiently. Success is measured by the time taken to find targets and the precision of their positioning on a map, leading to a decisive formula for determining the winner. However, securing the victory comes at the end of a challenging process. Simone highlights the delicate balance between computational power and flight time in design considerations. “For our task, we required a lot of computational power because we had to identify targets and obstacles fully autonomously,” he explains. “We had a trade-off between how long the drone could fly and the payload computationally on board. We added more computational power, boards, and sensors, so the drone’s flight time constantly reduced with increased power on board.” All the computation is performed onboard the drone, with only limited information provided to the ground control stations. Simone points out that part of the computation could have been offloaded to agents, but not doing this helped to optimize performance in a scenario where bandwidth was limited and the Wi-Fi connection was unstable. Reflecting on their winning strategy, Simone credits the team’s success to a robust and efficient solution rather than advanced algorithms. “We wrote most of the communication on layers and most

Computer Vision News 18 of the libraries to be really lightweight,” he reveals. “We transmitted only the data we needed, and most of the computation was done on the different agents. Each one was independent and able to carry out its own task without having to rely on information that could be disrupted by the Wi-Fi signal, which was not very good, and stop the movement of the robots.” Despite their achievements, Simone recalls initial hurdles in developing navigation algorithms from scratch, which made it challenging to design functions like obstacle avoidance and path planning. They switched to ROS-integrated control algorithms to solve this, which worked well. Away from competitions, Simone’s core research focuses on autonomous vehicles and vehicleinfrastructure cooperation, and he is part of a team that performs agricultural robotics. Looking ahead, he is already gearing up for the next round of the competition in October. “Now, we’re starting our new round of development,” he declares. “We still have some ideas on how to improve the robots on the detection part, the navigation, and the controller, particularly on the drone side. We have a new team starting to work on all those aspects to improve the solution for the new round in October!” Robotic Drone Contest

DON’T MISS THE 19 Computer Vision News Don’t miss the BEST of SPIE! “It’s been a privilege to witness the range of innovation and excellence displayed at SPIE Medical Imaging this year," said RSIP Vision's CTO Ilya Kovler. "Across over 800 papers and a broad range of topics, the conference has been a showcase of cutting-edge technology and visionary research. My congratulations to the award winners. Their work sets a high standard for the entire community. I was particularly inspired by efforts to integrate classical or more traditional imaging approaches with modern AI advancements, drawing on the benefits of both to improve the state of the art. The conscious focus from participants on clinical applications and improving patient care has left me with a newfound excitement about the possibilities that lie ahead to transform healthcare through computer vision in medical imaging!"

Computer Vision News 20 In Memoriam Emlyn Roy Davies (29.11.1940 - 29.2.2024) I (Ralph) interviewed Prof. E. Roy Davies exactly two years ago, when he co-edited a new book with Matthew Turk: Advanced Methods and Deep Learning in Computer Vision. Roy was Emeritus Professor of Computer Vision at Royal Holloway, University of London. I have the memory of a very kind and humble man. Our sympathies to daughter Sarah and family.

21 E. Roy Davies Computer Vision News

Computer Vision News 22 Women in Science Elisa Roccia is the Global Clinical Marketing Manager for MRI Oncology at Siemens Healthineers. Elisa, can you tell us about your work? In Siemens Healthineers, I’m responsible for the MRI Oncology global marketing strategy within the magnetic resonance imaging business line. The main topics I cover are breast cancer, liver cancer, and prostate cancer, but there are also a few other minor topics I cover as well. My position is similar to a product management position, but facing more towards the outside of the company. I act as a bridge between the clinical world, what’s happening in the field, in the community, and what happens inside Siemens Healthineers. Last time we spoke, you had just completed your PhD. You have strong research training but seem to be doing less research and more product management now. How did that happen? Yeah, that’s right. I do come from a research background. I studied biomedical engineering as an undergrad in Italy. Then, I did my PhD Read 100 FASCINATING interviews with Women in Computer Vision

23 Computer Vision News Elisa Roccia at King’s College London. The PhD was very technical indeed. I developed MR sequences to acquire MR images. I programmed a lot on my laptop and had to study many new things, from the physics of the MR system to C++ programming, for instance. During the PhD, because we were in a very nice, stimulating environment at King’s College, I had the chance to do many other activities alongside my PhD. For instance, I was very much involved in public engagement. We were going to different schools, talking with kids on what we were doing as part of our PhD, what the research was about, and what it means to be a scientist. Other events that come to my mind are the Royal Summer Science Exhibition, where we talked with the general public about our work and research. All these activities were super engaging to me. I was always so happy to talk in these more lay terms about science. I also enjoyed organizing events and conferences, so I understood that what I was missing as part of my PhD was more of a communication and human aspect. The position I have right now in Siemens Healthineers is a really good combination of these two aspects, because I need to use my technical knowledge to act as a translator between what happens from the technical point of view into more easily understandable concepts.

Computer Vision News 24 I imagine speaking to kids and the public is very different from being a marketing manager competing in the global market with all the sharks in the ocean. Yeah, no, absolutely. At Siemens Healthineers, I’ve been very lucky to be working in the Headquarter team, which is based in Germany in Erlangen. There, I met many brilliant colleagues, and it’s extremely stimulating to have all these people around you with so much knowledge about MRI in particular. It’s an environment that fosters being very competitive and at the top of what’s happening. In MRI in particular, there’s always something going on from the research perspective, it’s an ever-evolving field. This stimulates advancements in the field and challenges us to remain on top. With access to MRI remaining very competitive, do you ever worry that people will work more on cheaper and more accessible technologies like X-ray and ultrasound instead? The access aspect is definitely a real issue, but nowadays we have the right technologies and innovations that enable us to make MRI more accessible. One of these is the fact that the MRI acquisition is much faster than what used to be. If you have a shorter examination time, you can scan more patients. This means that you have a higher patient throughput, which balances the costs of the MRI systems. As we’re talking in a computer vision journal, maybe we can also mention that these technologies are often based on deep learning – how we can speed up the scan and the reconstruction of the images. This technology has been really disruptive in the past few years, and this is just one of the examples of how we can improve access to MRI, in addition to having a broad portfolio of systems that can meet different needs in different countries. What computer vision you work on and how algorithms are used in your office? In the past few years, new deep learning-based reconstruction algorithms have been introduced, not only by Siemens Healthineers but also by other companies and Women in Science

25 Computer Vision News universities. It’s a topic a lot of academics and PhD students are working on, on how we can make the acquisition and the reconstruction faster, which means acquiring less data and still obtaining a comparable result thanks to the neural networks that lie behind it. For MRI, in particular, this means not having to compromise between signal-tonoise ratio, acquisition time, and image resolution. Now, you can find a nice balance between the three. More signal, less noise. Exactly! [Elisa laughs] I have a devil’s advocate question – why does Siemens Healthineers need you in its workforce? Everyone brings their own personal contribution to the role thanks to our past experience. What I think is valuable in my particular case is the technical background, because that means I can take a technical publication and translate into a clinically relevant presentation, with a catchy title, or come up with a topic for an event that reflects what’s happening in the field, so that it becomes attractive to someone not necessarily as technical. Then, as I mentioned earlier, the communication skills, engaging with people and with the community, helps a lot in this position. What are you most proud of in the first years at Siemens Healthineers? Elisa Roccia

Computer Vision News 26 If I look back at the past three years, the difference I see in the confidence I have in the job I do day to day has changed massively. Starting as a fresh PhD student, getting into this huge company where there are so many smart people, and covering many more aspects and topics than I was used to. During the PhD, you look at a very narrow topic, and you’re the expert on that topic. Whereas, at this level, you have to cover much more at a higher level, so you’re not that deeply into everything, but still, it’s a lot. I remember that at the beginning, I was thinking, how will I be able to know all these things? Then, with time and experience, you learn the right balance between knowing enough to cover the topic without necessarily having a PhD in everything, which, of course, is not possible! I think what I’m most proud of now is feeling the confidence I have in handling my position and going out there and talking about my topics. If I told you that 10 years from now, you will be a medical imaging researcher who writes code, would you believe me? If I look back at the past three years, the difference I see in the confidence I have in the job I do day to day has changed massively. Starting as a fresh PhD student, getting into this huge company where there are so many smart people, and covering many more aspects and topics than I was used to. During the PhD, you look at a very Women in Science

27 Computer Vision News narrow topic, and you’re the expert on that topic. Whereas, at this level, you have to cover much more at a higher level, so you’re not that deeply into everything, but still, it’s a lot. I remember that at the beginning, I was thinking, how will I be able to know all these things? Then, with time and experience, you learn the right balance between knowing enough to cover the topic without necessarily having a PhD in everything, which, of course, is not possible! I think what I’m most proud of now is feeling the confidence I have in handling my position and going out there and talking about my topics. If I told you that 10 years from now, you will be a medical imaging researcher who writes code, would you believe me? I think that will not happen! [she laughs] I tried that already, and it was a really fun experience. I especially liked seeing how a single line of code could change the outcome of what I could see in a patient’s scan. That was invaluable. Changing something that seemed to be a small thing on my laptop and then going downstairs at the scanner, and with that small change, I could see a tumor that I couldn’t see before. That was really exciting and satisfying. Does it ever make you feel sad working with people’s diseases all the time? No, that’s not a feeling I’ve ever had. On the contrary, I’m happy to, in my small way, contribute and do something about it. Of course, there’s always a huge team behind this. My contribution may be small, but I still feel like I’m doing something to go in that direction, to improve the technology so that people can receive an earlier diagnosis. Yeah, to have an impact and be involved, and also, especially for me, what is very important is to communicate what I do to the community, to my network, so that more people know what’s happening in the community. If MRI is going to be used for screening prostate cancer, for instance, which is something that might happen in the future, it’s good that people know about this, because not everyone reads scientific publications. Elisa Roccia

Computer Vision News 28 What is the best thing that you have learned or received from Siemens Healthineers? That’s a tricky one. I’m sure that you are very grateful to your company. Indeed, it’s tricky because there are so many things! [she laughs] I would say, the fact that I now cover many more topics than I used to before. The possibility to learn something every single day. Because it’s such a big company, you have a lot of touchpoints with many different people. Some of them might be working in MRI, and others might be working in ultrasound or CT, so I get to expand my knowledge by being with all these people. Something that I also like very much is the travelling, which I’ve been doing quite often in the past couple of years. That gives me the opportunity to see how things are done in different countries and what’s really happening in the field – even just comparing Italy to Germany or to the UK or the US, every country has its own trends or preferences. The nice thing about these global roles is that you get to see a little bit of everything, and you get to think of what there is in common and what there is that’s different, how you can adapt your talk so that you include everyone and make sure that what you say is relevant for everyone. That’s something invaluable. Is there anything else our readers should know about your work? Maybe not necessarily related to my job, but the advice I’d give students is to try things out during your PhD. That’s the time when, yes, you’re super busy because, of course, you need to work on your thesis and projects, but that’s also the chance you have to experience new things. I was working mainly on the technical side. I tried something more toward the communication aspect, and that helped me find the direction I wanted to go. There might be someone who’s working in communication and could try out what programming means, and they might find out that’s exactly what they want or exactly what they don’t want to do. Yeah, just explore things as much as possible! Women in Science

APRIL 2024

Computer Vision News 30 XPlan.ai Confirms Premier Precision in Peer-Reviewed Clinical Study of its 2D-to-3D Knee Reconstruction Solution. The study, featured in the prestigious Journal of Clinical Medicine, found sub-millimeter accuracy on real-world patient imaging, enabling widespread access to precise, image-based computer-assisted surgery without the need for a CT scan. TEL AVIV, Israel –Mar 21, 2024 XPlan.ai, a spinoff of RSIP Visionthat uses AI to democratize precision orthopedics, reaches a new milestone for its X-ray based 3D bone modeling system with the publication of a peer-reviewed clinical study confirming unprecedented, sub-mm accuracy in a variety of clinically relevant measurements. The study, led by a consortium of orthopedic surgeons and published in the Journal of Clinical Medicine, found that XPlan.ai offers a promising alternative to conventional CT scans, opening up the market for imagebased computer-assisted surgery such as robotics, AR, and navigation while increasing efficiency, saving costs and avoiding unnecessary radiation exposure. XPlan.ai uses advanced artificial intelligence (AI) to produce accurate 3-dimensional bone models from two standard X-ray images. Together with XPlan’s automated planning technologies, this model can be used for surgical planning and navigation during orthopedic procedures such as total knee replacement, potentially providing incredibly large patient populations with the most advanced care while avoiding the cost, time, administrative overhead, reimbursement issues, and added radiation involved in a conventional CT scan. “Replacing CT scans with standard, universally available X-rays has long been considered a ‘holy grail’ of computer 2D-to-3D Knee Reconstruction Solution

31 Computer Vision News computer-assisted orthopedic surgery,” saidMoshe Safran, CEO at XPlan.ai. “Accuracy and robustness have been the key challenges, and our technology provides unique capabilities in this regard, paving the way for universal access to image-based computer assisted surgery.” XPlan.ai‘s groundbreaking technology provides two main benefits: clinical and operational. The clinical benefit is lower levels of radiation exposure from X-ray imaging compared to a complete knee CT scan. Operationally, all relevant patients routinely undergo X-ray imaging, offering much better accessibility than CT. Additionally, Xrays are more widely reimbursed in the U.S. healthcare system, are often lower in cost, and offer a quicker and more streamlined patient journey from diagnosis to theOR. The clinical evaluation of this tool XPlan.ai XPlan’s high-fidelity X-ray based 3D knee model, compared to a conventional CT based model of the same patient

Computer Vision News 32 Tool was conducted using imaging from total knee replacement patients fromAssuta Medical Center in Tel-Aviv, a leading medical center in Israel. Unlike cadaver-based studies often used in the orthopedic technology space, the patients enrolled in this study had pathological anatomies typical of real-world clinical cases. The accuracy of the tool was proven by comparing the resulting 3D models to the ground truth patient anatomy given in a corresponding CT scan. The accuracy was measured in multiple areas that are used for actual surgical planning, including bony landmarks and anatomical axes, and was found to be equivalent to CT-based measurements at a sub-mm level across the board. “Today’s orthopedic patients demand precise and personalized care, incorporating technologies such as AR and surgical robotics. Using a 3D image-based preoperative model to plan the case is the best approach, enabling surgeons to be better prepared and saving precious time in the OR,” said Dr. Vadim Benkovich, Head of Orthopedic Department at University Medical Center Soroka and Founder & Medical Director of the Israeli Joint Health Center at Assuta Medical Center. “This is a win for both patients and providers – both reducing the chances of a complication or infection, and at the same time improving efficiency and Bony landmark regions. (A) Femur anterior cortex. (B) Femur posterior condyles. (C) Femur distal condyles. (D) Tibial tuberosity, medial and lateral plateaus. 2D-to-3D Knee Reconstruction Solution

33 Computer Vision News providing care to more patients with the same amount of resources. I am encouraged to see that the accuracy of XPlan’s solution has passed the most stringent tests conducted in our study.” Going forward, XPlan.ai plans to apply for FDA clearance of its knee reconstruction solution. In parallel, further applications are under development for additional anatomies, with promising initial results indicating wide applicability of XPlan’s unique technology. About Xplan.ai XPlan.ai, a spinoff of RSIP Vision, is building an advanced AI-based solution that provides 3D bone reconstructions from standard 2D X-rays. Our platform technology can potentially improve the efficiency and safety of computer assisted surgery, including eliminating the need for preoperative CT scan and reducing time in the OR. XPlan.ai is headquartered in Jerusalem, Israel. More information is available on the company website at xplan.ai and by contacting us at info@xplan.ai. XPlan.ai Moshe Safran, CEO at XPlan.ai

Photos courtesy of Nadieh Khalili, an AI scientist from Iran. She poses in the photo on the right with Khrystyna Faryna, a PhD student from Ukraine. Both women are affiliated with Radboud University Medical Center. Nadieh is interested in multimodal medical data ranging from Pathology, Radiology and genomics.

Olivia Sandvold is a fourth-year bioengineering PhD student at the University of Pennsylvania. Her paper, demonstrating a novel hybrid spectral CT system to reduce error in iodine quantification, has just won the Physics of Medical Imaging Best Paper Award at SPIE Medical Imaging. This impressive young woman is here to tell us all about it. Computer Vision News 36 Physics of Medical Imaging Best Paper Hybrid spectral CT system with clinical rapid kVp-switching X-ray tube and dual-layer detector for improved iodine quantification In this paper, Olivia introduces a new multi-energy hybrid CT system. It combines a traditional kVp switching X-ray tube source, which alternates between various energy levels emitted from the Xray tube, with a dual-layer or sandwich detector, a spectral detector that discriminates energy into upper and lower domains. The work demonstrates that joining these technologies and leveraging the properties of the multiple channels acquired can reduce errors in iodine quantification, which is important in clinical diagnoses. Physicians use iodine contrast to measure increased uptake from the bloodstream or the oral pathway into different body cavities. If accurate, these measurements can act as quantitative biomarkers for disease.

37 Computer Vision News Olivia Sandvold “We want to reduce the number of scans that patients have to take over a period of time,” Olivia tells us. “As we know, CT involves X-rays, and Xrays involve radiation, and we always want to decrease the amount of radiation we supply to the patient. If we have an increased sensitivity to iodine to be able to acquire very high-quality images for diagnosis, then the patient should not have to undergo many additional scans.” From a clinician’s perspective, this represents an improvement over conventional CT or even single instrumentation spectral CT systems, offering greater confidence in quantitative measurements. This newfound precision improves pre- and post-treatment assessments and contributes to establishing standardized medical practices, ultimately benefiting the entire medical community through increased knowledge and accuracy in diagnostics. Olivia says the system encompasses hardware and software advancements over previous setups. She points out that thinking about hardware in combination with software, AI, and machine learning is key. “If we have better underlying measurements from our hardware, we’ll be better at applying different computer vision, downstream image processing, and algorithmic development,” she attests. “The problem is now we have four channels of data, so what do we do with all this data? We’re proposing different software instrumentations and the inclusion of these four channels Top-down view of bench

Computer Vision News 38 to utilize each one of the spectral properties specifically.” Getting the different systems to talk to each other presented a challenge. With a rapid kVp-switching tube, which uses both high-energy and low-energy X-rays, it was vital to precisely coordinate the switching time from high to low so that the measurements did not get mixed up. Ensuring a good hardware setup was critical to accurately separating things. “On the software side, we wanted to make sure that when we segregate our four channels of data and then combine them to have the least bias in our measurements, we have appropriate weighting based on the spectra,” Olivia explains. “It was a challenge to take all of the information our benchtop system gave us and then have a new pipeline in place that fits into the existing clinical pipeline. We have a starting point and an endpoint, and we want to make sure that what we build falls into that scheme so that we don’t have to change a lot if we push this to the clinical level.” While the current focus remains on iodine quantification, Olivia hints at potential future applications, including the integration of computer vision for automated segmentation of different regions of high iodine contrast. Higher sensitivity to iodine allows the generation of highly accurate iodine maps, which could revolutionize tumor analysis, grading, staging, and Labeled components of hybrid system Physics of Medical Imaging Best Paper

39 Computer Vision News metastases detection. Did we just get a sneak preview of next year’s winning paper? “Yes, hopefully!” she laughs. “In this paper, we’ve shown improved quantification and decreased error by using and leveraging these two technologies. Now that we’re more confident, what else can we do that has a downstream effect on specific patient populations?” Having scooped the Best Paper award, Olivia tells us she is very proud of the work and honored that it has been recognized in this way. She set out to investigate something people are curious about but may not have the resources to work on in their labs. Could this have been the secret to its success in the eyes of the judges? “I think one of the compelling reasons this paper was important to the committee was that we’re doing our best to ensure translatability from the physical work to the clinic,” she remarks. “Being at the University of Pennsylvania, we were so fortunate to be able to build a system that uses clinical components and is not just a benchtop system throwing together parts acquired from an Xray source or detector.” Olivia is currently in the fourth year of her PhD and hopes to graduate in the next couple of years. “I’m passionate about continuing research and development with medical imaging,” she adds. “I’d like to continue working in this field, looking at CT physics and the development of new devices. That’s my goal. I could see myself continuing in academic research, but also teaching, down the line, and potentially working in the industry, so it’s still a little bit open.” We certainly hope this award will help open the right doors. “Thank you so much,” she smiles. “It definitely helps to have a little recognition!” Three defined weighting schemes to combine multi-energy channel data Olivia Sandvold

Zhangxing Bian is a third-year PhD candidate at Johns Hopkins University. Fresh from winning the 2024 Image Processing Best Student Paper Award at the SPIE Medical Imaging conference, he is here to tell us more about his work on tag fading, a post-processing complication that affects tagged MRI. Computer Vision News 40 Image Processing Best Student Paper Is registering raw tagged-MR enough for strain estimation in the era of deep learning? Tagged MRI (tMRI) is a specialized technique that adds specific patterns to tissues, similar to temporary tattoos. When the tissue moves, the tag moves with it, and clinicians can track these movements to better understand cardiac, muscular, and speechrelated functions post-injury or in a disease context. However, a challenge within this domain is the phenomenon of tag fading, where the visibility of tags diminishes over time, complicating accurate motion tracking and analysis. In this paper, Zhangxing wants to understand what causes tag fading and what can be done post-processing to estimate tissue motion better. “Two decades ago, researchers proposed some classic signal processing methods, which extract the material’s phase information through a Fourier transform for tracking the motion of the tissue,” he explains. “It can be seen as a special type of phase-based optical flow approach. The benefit is it circumvents the tag fading problem.” Recent advancements in deep learning have sparked a revolution across various fields, presenting alternative methods that do not use those classic techniques to preprocess the image but directly process raw tMRI inputs to estimate the motion or strain fields of the moving tissues.

41 Computer Vision News Zhangxing Bian Zhangxing revisits both methodologies in this work, highlighting the current limitations of deep learning in biomedical applications and emphasizing that it is not a universal solution to this problem. Instead, he advocates for a balanced approach that integrates classic signal processing. “People’s understanding of tag fading is currently not very complete,” he advises. “Our first contribution is to model the tag fading by considering factors that previous research ignored. The interplay between the T1 relaxation and the repeated application of radio frequency pulses during the imaging sequences was overlooked in previous research on tMRI post-processing. We build a mathematical model to factor that interplay into the equation.” The findings of this work are derived from both simulated images and an actual phantom scan. Experiments on synthetic and real tMRI reveal the limitations of widely used similarity losses in raw tMRI and emphasize caution in registration tasks where image intensity changes over time. While not proposing a new algorithm, this multidimensional work encompasses a thorough comparative analysis between deep learning and traditional The left image is a sagittal view of a head. The video on the right shows the tagged-MRI acquired during speech when the tongue is moving. The tagged-MRI has a significant phenomenon called, tag fading, which a gradual decrease in tag visibility over time. The brightness constancy assumption used in optical flow or image registration does not hold, which leads to inaccurate motion estimation.

Computer Vision News 42 traditional methods to try to solve the tag fading issue. Zhangxing thinks this is why it piqued the interest of the SPIE Medical Imaging Best Paper Awards judges. “The conference typically has different specialization tracks,” he tells us. “The two major tracks are called Medical Imaging and Image Processing. I think one reason this paper drew the committee’s attention is that it provides both sides with an understanding of the tag fading issue. Also, our work is like a whistleblower to remind people that there is something that deep learning currently can’t handle well and that, two decades ago, the traditional signal processing method did pretty well and elegantly. This observation provides some promising directions for the field going forward.” Looking ahead, Zhangxing is working on extending the research by delving deep similarity metric learning. This technique has shown some promise in the inter-modality image registration task and is presumed to be robust against the challenges posed by tag fading. Although this work is limited to using the classic 1:1 SPAMM tagging sequence, the potential of using ComplementarySPAMM (CSPAMM) sequences to The classic 1:1 SPAMM sequence is used for acquiring taggedMRI. Each tagging step is followed by a series of imaging sequences. “TF” stands for timeframe. During each TR interval, the spin system is tipped by alpha degree and multiple line segments in k-space are captured using gradient echoes. The “tagging-imaging" cycle is repeated until sufficient k-space coverage is achieved. Based on this imaging sequence, a mathematical model has been built in this research for better understanding the tag fading process. “The aim of this research is to better understand the tag fading phenomenon itself and evaluate the effectiveness of both traditional harmonic phase-based methods and deep learning-based registration methods (trained with diverse similarity objectives) when estimating motion with the presence of tag fading.” Image Processing Best Student Paper

RkJQdWJsaXNoZXIy NTc3NzU=