# 机器学习代考_Machine Learning代考_COMP4702 Image Interpretation Using Pixel Attribution

Pixel attribution methods highlight the pixels that were relevant for certain image classification by a neural network. Figure 12-6 is an example of an explanation.

Pixel attribution is a special case of feature attribution for images. Feature attribution explains individual predictions by attributing each input feature according to how it changed the prediction (negatively or positively). The features can be input pixels, tabular data, or words. SHAP, Shapley values, and LIME are examples of general feature attribution methods. We consider neural networks that output as prediction a vector of length, $\mathrm{C}$, which includes regression where $\mathrm{C}=1$. The output of the neural network for image, $\mathrm{I}$, is called $\mathrm{S}(\mathrm{I})=[\mathrm{S} 1(\mathrm{I}), \ldots, \mathrm{SC}(\mathrm{I})]$. All these methods take as input $\mathrm{x} \in \mathrm{Rp}$ (can be image pixels, tabular data, words, …) with $\mathrm{p}$ features and output as explanation a relevance value for each of the $p$ input features: $R c=[R c 1, \ldots, R c p]$. The $c$ indicates the relevance for the c-th output SC(I).

There is a confusing amount of pixel attribution approaches. It helps to understand that there are two different types of attribution methods.

• Occlusion- or perturbation-based: Methods like SHAP and LIME manipulate parts of the image to generate explanations (modelagnostic).
• Gradient-based: Many methods compute the prediction gradient (or classification score) with respect to the input features. The gradientbased methods (of which there are many) mostly differ in how the gradient is computed.

## 机器学习代考_Machine Learning代考_Image Interpretation Using Class Activation Maps

Class activation maps (CAM) were introduced in the paper “Learning Deep Features for Discriminative Localization” using global average pooling in CNNs. A CAM in a particular category indicates the discriminative region used by CNN to identify the category.

It has been observed that convolution units of various layers of a convolutional neural network act as an object detector even though no such prior about the location of the object is provided while training the network for a classification task. Even though convolution has this remarkable property, it is lost when using a fully connected layer for the classification task. To avoid using a fully connected network, some architectures like Network in Network (NiN) and GoogLeNet are fully convolutional neural networks. Global average pooling (GAP) is a commonly used layer in such architectures. It is mainly used as a regularizer to prevent overfitting while training. The class activation map simply indicates the discriminative region in the image, which the $\mathrm{CNN}$ uses to classify that image in a particular category. For this technique, the network consists of ConvNet, and right before the softmax layer (for multiclass classification), global average pooling is performed on the convolutional feature maps. The output of this layer is used as a feature for a fully connected layer that produces the desired classification output. Given this simple connectivity structure, we can identify the importance of the image regions by projecting back the weights of the output layer onto the convolutional feature maps.

