Computer Vision News - December 2022

47 Anita Rau 3D information from colonoscopic images. But ground truth is essential for learning. Therefore, one central question that I sought to answer was how to predict depth during medical procedures without the need for real ground truth. I found synthetic data (Figure 1) to be a helpful alternative to real data because we could simply generate all necessary labels. I thus created one of the first public, synthetic datasets for colonoscopy [1] . While synthetic data provides ground truth and can be used to train networks, its appearance is distinguishable from real data. So, the focus of my research shifted from the question of how to predict depth to how we can integrate two different domains (synthetic and real) into a mutual framework. Previous works use two networks: one for domain adaptation and another for the task. But we found that combining both tasks into a mutual framework leads to more resilient depth maps (Figure 2) [2] . Today, depth networks for colonoscopy can predict local 3D shapes fairly well but understanding the geometric relationship between two images remains challenging. My collaborators and I found that box embeddings—a concept derived from natural language processing—can be applied to images and help predict the relationship directly and in a human-interpretable way [3] . But integrating local structures into a global map of a colon has yet to be solved. Therefore, one of my final tasks during my PhD was to improve our public dataset [4] and help organize a challenge that will hopefully help other researchers tackle the remaining challenges in colonoscopic 3D reconstruction [5] ! Figure 2: Trained with synthetic ground truth, our model can predict depth from real images.