Computer Vision News - January 2024

23 datascEYEnce! Computer Vision News Switching from NLP to computer vision was relatively seamless thanks to uni courses and the overlap in basic knowledge of NLP and CV research. The bigger gap she had to fill was the understanding of the data itself. While mostly working with text data that can easily be interpreted she then needed to be able to understand different disease patterns in the two retinal image modalities: optical coherence tomography (OCT) and colour fundus photography (CFP). She also learned, that both modalities give complementary information on a disease, which makes it useful to combine the forces and get a model that learns from both modalities. After mastering the data, Lehan focused on addressing the multimodality problem. The first possible approach involved concatenating features for dataset fusion, but the available research didn't align with her needs - hence it was difficult to find a working baseline. Multi-modal image fusion turned out to perform similar or even worse than single modality with her available data. Lehan needed to find a way to solve the problem from another angle. She explored two more options for fusing knowledge, one of which was contrastive learning, and the other one knowledge distillation. Opting for knowledge distillation, she used a teacher model trained on fundus images and a student model for OCT images. She found, that the representation is even for different modalities, and can be seen by the proximity in the feature