support their research. “The difficulty is really getting domain expertise to work with us,” he says. “We need to spend time communicating with marine biologists so we can understand how these tools could improve their work.” For Kit, this collaboration reflects a broader goal for the field. “As computer vision scientists working on AI, we should be more like the first step movers,” he asserts. “We should have open arms and just let them collaborate with us because, at the end of the day, we want AI to contribute to scientific study. AI for science is a bigger goal. We need to work closely together for the greater good.” On the technical side, the team found that current vision-language models still struggle with detailed biological descriptions. Existing captioning systems often produce very short summaries of images. “Usually, they just say that this is a red fish, this is a green fish,” David says. “Marine biologists need a more professional description.” Instead, scientists often rely on fine-grained details such as the shape of the dorsal fin, body proportions, or distinctive scale patterns. Generating those descriptions requires models to pay attention to specific regions of an organism. In their experiments, the researchers found that many systems focus primarily on the overall scene rather than the fine-grained details needed for species identification. This gap highlights the need for models that integrate global context with localized visual analysis. For David, the collaboration itself stands out as one of the most rewarding aspects of the project. Working closely with marine biologists 13 DAILY WACV Sunday Yuk Kwan (David) Wong
RkJQdWJsaXNoZXIy NTc3NzU=