Computer Vision News - December 2023

13 A Flexible Nadaraya-Watson Head Can Offer … Computer Vision News The paper was first published in Transactions on Machine Learning Research. by Alan Wang The dominant approach to image classification in computer vision is to feed the image through a neural network which extracts features of the image (e.g. ResNet, ViT), and then passes those features through a fully-connected (FC) head. This architecture is a black box and is non-interpretable, since a human cannot understand how the model arrived at its decision. In addition, the predictions are often poorly calibrated, meaning that the model can be overconfident about its correctness. This can mislead humans as to how trustworthy the model really is. As an alternative, we propose the Nadaraya-Watson (NW) head, which we show provides better calibrated predictions and better interpretability and explainability while performing comparably to the FC head. Essentially, the NW head can be viewed as a soft variant of the nearest-neighbor classifier. For each query image to be classified, we assume we have access to a "support set" of real samples and their labels from the training dataset. To produce a prediction, the NW head passes both the query image and all support images through the feature extractor, and subsequently computes a similarity between the query feature and each support feature. These similarities are normalized to weights, which are in turn used to compute a weighted average of the labels in the support set used as the final prediction (see image next page). How is the support set constructed? During training, one can sample the query and support set randomly from the training set at each training step. During inference, the user has a high degree of flexibility in how to choose the support set. In our experiments, we try randomly sampling from the training set, using the entire training set, and performing within-class clustering to construct the support set. Each of these inference "modes" has its own advantages and trade-offs, providing flexibility to the user. Experimentally, we find that the NW head exhibits comparable to superior performance to the FC head while providing better calibrated predictions. These characteristics are essential in real-world deployment scenarios.

RkJQdWJsaXNoZXIy NTc3NzU=