Visualizing dyad
Consider an image from skimage.data
. For simplicity, say that is the matrix representing that image. You are asked to visualize the dyad of the SVD Decomposition of and the result of compressing the image via SVD. In particular:
- Load the image into memory and compute its SVD;
- Visualize some of the dyad of this decomposition. What do you notice?
- Plot the singular values of . Do you note something?
- Visualize the -rank approximation of for different values of . What do you observe?
- Compute and plot the approximation error for increasing values of , where is the -rank approximation of .
- Plot the compression factor: for increasing values of .
- Compute the value such that (i.e. when the compressed image requires the same amount of informations of those of the uncompressed image). What is the approximation error for this value of ? Comment.
It is strongly recommended (but not mandatory) to consider a grey-scale image for this exercise. You can also use an image downloaded from the web. Clearly, if your image will be an RGB image, then its shape will be (m, n, 3)
, where the last dimension corresponds to the three channels (Red, Green, and Blue). Every point discussed in the Homework has to be done on each channel separately, and then aggregated back to an RGB image.
Classification of MNIST Digits with SVD Decomposition.
For this exercise we aim to develop a classification algorithm on MNIST digits using SVD decomposition. We recall that, given a matrix and its SVD decomposition , it is easy to show that an orthogonal basis for the space of the columns is given by the first columns of the matrix , where is equal to the number of non-zero singular values of . We will make use of the space of the columns defined by the matrix and the following Theorem:
Theorem 1. Let be a subspace of with , and let be an orthogonal basis of . Then, for any , the projection of onto has the following form:
Corollary 1.1. Let be a matrix with SVD decomposition , since is the dimension of the space defined by the columns of and the columns of , are an orthonormal basis for that space, the projection of an -dimensional vector on this space can be easily computed as:
Consider as an example a binary classification problem, where we want to distinguish between hand-written digit representing numbers 3 and 4. We will refer to the class of the images representing number 3 as , and to the set of images representing the number 4 as . Let be the number of elements in , and be the number of elements in . Let be the matrix such that its columns are a flatten version of each digit in , be the matrix such that its columns are a flatten version of each digit in , and consider:
the SVD decomposition of the two matrices.
If is a new, unknown digit, we can predict its class through our classification algorithm by projecting it onto the spaces induced by the SVD of and via:
and classify as an element of either or based on being greater of lower than , respectively. In this exercise, you are required to implement this idea in Python.
The description provided up to this point is only meant to understand the basic idea of the algorithm we aim to implement. From now on, I will list the point you are effectively required to implement in Python, therefore I will start re-defining some quantities, possibly overlapping with some discussion already made.
- Implement the binary classification algorithm discussed above for the digits 3 and 4 of MNIST dataset. Follow these steps:
- Download the MNIST dataset from kaggle.com/datasets/animatronbot/mnist-digit-recognizer and load it into memory by following the steps we did in the PCA class. When loaded into memory, this dataset appear as an array with shape , containining the flattened version of grayscale handwritten digits, plus a column representing the true class of the corresponding digit. By pre-processing the data as we did in class, you should obtain a matrix
X
containing the flattenened digits, with shape(784, 42000)
, and a vectorY
of the associated digit value, with a shape of(42000,)
. - Write a function taking as input an index value
idx
and visualizes the image ofX
in the corresponding index (i.e.X[idx, :]
). Use the functionplt.imshow
. - Filter from
X
only those elements that corresponds to digits 3 or 4. This can be done, for example, by using the boolean slicing ofnumpy
arrays, as already discussed in class. - Split the obtained dataset in training and testing in a proportion of . From now on, we will only consider the training set. The test set will be only used at the end of the exercise to test the algorithm.
- Call
X1
andX2
the submatrices of the training set, filtered by the two selected digits, corresponding to those element associated with number 3 (classC1
), and with number 4 (classC2
). - Compute the SVD decomposition of
X1
andX2
withnp.linalg.svd(matrix, full_matrices=False)
and denote the -part of the two decompositions asU1
andU2
. - Take an unknown digit from the test set, and compute and .
- Compute the distances and , and classify as if , as if .
- Repeat the experiment for different values of in the test set. Compute the misclassification rate for this algorithm.
- Repeat the experiment for different digits other than 3 or 4. There is a relationship between the visual similarity of the digits and the classification error?
- Comment the obtained results.
- Download the MNIST dataset from kaggle.com/datasets/animatronbot/mnist-digit-recognizer and load it into memory by following the steps we did in the PCA class. When loaded into memory, this dataset appear as an array with shape , containining the flattened version of grayscale handwritten digits, plus a column representing the true class of the corresponding digit. By pre-processing the data as we did in class, you should obtain a matrix
Given a classification algorithm , which maps an input image into its predicted class, the misclassification rate on the test set is defined as:
MR = \frac{1}{N_{test}} \sum_{i=1}^{N_test} \iota(f(x_i) == y_i),
where $N_{test}$ is the number of elements in the test set, $(x_i, y_i)$ represents the $i$-th element of the test set, while $\iota(f(x_i) == y_i)$ is a function which is equal to 0 if $f(x_i)$ is equal to the true class $y_i$, while it is equal to 1 if $f(x_i)$ guesses the wrong digit (i.e. it is different from $y_i$). More simply, the Misclassification Rate represent the average number of error of the model over the test set. 2. The extension of this idea to the multiple classification task is trivial. Indeed, if we have more than 2 classes (say, $k$ different classes) $C_1, \dots, C_k$, we just need to repeat the same procedure as before for each matrix $X_1, \dots, X_k$ to obtain the distances $d_1, \dots, d_k$. Then, the new digit $x$ from the test set will be classified as $C_i$ if $d_i$ is lower that $d_j$ for each $j = 1,...,k$. Repeat the exercise above with a 3-digit example. Comment the differences. ## Clustering with PCA In this exercise we want to analyse the ability of PCA in clustering data by projecting very high-dimensional datapoints to 2 or 3 dimensions. In particular, consider the same MNIST dataset used in the previous exercise. You are asked to: * Load and pre-process the dataset as did in the previous exercise, to get the matrix `X` with shape `(784, 42000)`, and the associated vector `Y`. * Choose a number of digits (for example, 0, 6 and 9) and extract from `X` and `Y` the sub-dataset containing only the considered digits, as did in the previous exercise. * Set $N_{train} < N$ and randomly sample a training set with $N_{train}$ datapoints from `X` and `Y`. Call them `X_train` and `Y_train`. Everything else is the test set. Call them `X_test` and `Y_test`, correspondingly. This has to be done **after** filtering out the selected digits from `X` and `Y`. * Implement the algorithms computing the PCA of `X_train` with a fixed value of $k$. Visualize the results (for $k = 2$) and the position of the centroid of each cluster. The clusters are identified by projecting `X_train` via PCA to its low-dimension version `Z_train`, and then splitting it into sets (say, `Z1`, `Z2`, `Z3`) based on the digit that was represented in that position before the PCA projection. Each set `Z1`, `Z2`, `Z3` represents a cluster, of which we can easily compute the centroid. * Compute, for each cluster, the average distance from its centroid. Which property of PCA projection does this quantity measure? * By keeping the **same** projection matrix `P` from the train set, project the test set `X_test` on the low-dimensional space. * Consider the clusters in `X_test` by considering the informations on `Y_test`, similarly to what we did on the previous point. Consider the centroids computed from the training set. For each cluster in the test set, compute the average distance to the corresponding centroid (from the train set). Comment the results; * Define a classification algorithm in this way: given a new observation `x`, compute the distance between `x` and each cluster centroid computed on the training set. Assign `x` to the class corresponding the the closer centroid. Compute the misclassification rate of this algorithm on the test set; * Repeat this experiment for different values of $k$ and different digits. What do you observe? * Compare this classification algorithm with the one defined in the previous exercise. Which performs better? --- Other assignments: - [[SMMAI Homework 1|Homework 1]] - [[SMMAI Homework 3|Homework 3]] - [[SMMAI Homework 4|Homework 4]]