Computer Vision News - February 2024

35 Computer Vision News massive dataset; and third, lots of computing,” he explains. “SAM already demonstrated that the recent transformer is a great network. In the beginning, our two main bottlenecks were the data and the computing. We spent nearly two months creating data. These datasets are scattered across the network and stored in different formats. We invested lots of time in standardizing these images to make them in the same format. Then, since training from scratch is almost impossible, we decided to use transfer learning.” The team created a large-scale medical image dataset with more than 1.5m image-mask pairs. They gathered all the publicly available datasets they could find. Their original plan was to train the model from scratch, but they realized the computing effort required was too huge to be affordable. Meta used more than 200 GPUs for their original paper – far beyond the means of most academic groups. Ultimately, they employed transfer learning to enhance the model’s ability to handle medical images. “This method is a promptable segmentation method, so it contains an image encoder, which is a vision transformer to extract the features from the image and obtain the image embedding, and a prompt encoder, which can encode the bounding box prompt to obtain the prompt embedding,” Jun reveals. “Then, we also have a transformer-based mask decoder to merge the image embedding and the prompt embedding to generate the final masks.” For Bo, the clinical potential of this approach was the most exciting part. A trial with two groups of doctors showcased a 80%+ increase in workflow efficiency for those doctors assisted by MedSAM over those following a standard MedSAM MedSAM can be used to reduce the annotation time cost by 80+%.

RkJQdWJsaXNoZXIy NTc3NzU=