Computer Vision News - May 2022

46 AI Research that summarizes the performance matrix. In the following figure there’s a representation of visual transformers. In the case of segmentation ViTs have proven very successful. Clear cut and detailed segmentation are a decisive step in image guided treatment and computer-aided diagnosis. A great deal of image segmentation models has been proposed. In the last 40 years, from traditional models to deep neural networks, they have outperformed all the state of art segmentation models. Transformers functions prominently in error free segmentation of medical images because of their capability to model the global context. As the organs lay out over a wide receptive field, hence, transformers can easily encode these organs by modelling the association of pixels that are distant spatially. The background is dispersed in medical scans; for that reason, gaining the understanding of the global context between those pixels that relate to the background will be beneficial for the model to do the unerring classification, as sheen below. Xiong et al. proposed a novel hierarchical neural network architecture using reinforcement learning to generate a long coherent medical report. They incorporated the self-critical reinforcement learning method into the detector, encoder, and captioning decoder. They used DenseNet-121, pre-trained on chest X-ray 14 dataset, to detect the region of interest (ROI) proposals using a bottom-up attention mechanism. The region detector outputted a set of ROI proposals along with classified classes and some associated attributes. They used top-down transformer visual encoder to extract further pixel-wise visual information from proposed ROI using pooling operations. Their proposed architecture outperformed the state-of-the-art methods for the CIDEr evaluation metric on the IU-Xray dataset, but for the BELU 1 metric, their model could not perform state of the art. Their model over-fitted as they used only the findings portion of the generated medical report. They need to use a larger labelled dataset to solve this problem.

RkJQdWJsaXNoZXIy NTc3NzU=