Classifying MNIST Digits Using PCA + Deep Learning

1. Import Libraries and Load the Dataset

We first import all the necessary libraries we will need to train our data, and then load our MNIST dataset which can be found here — you can also use keras.datasets to import the mnist dataset!

# Import Libraries
%matplotlib inline
import pandas as pd
from sklearn.decomposition import PCA
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from keras.utils.np_utils import to_categorical
from sklearn.preprocessing import LabelEncoderfrom tensorflow.keras.layers import Input,InputLayer, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D
from tensorflow.keras.layers import AveragePooling2D, MaxPooling2D, Dropout
from tensorflow.keras.models import Sequential,Model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import ModelCheckpoint,LearningRateScheduler
import keras
from tensorflow.keras import backend as K
from sklearn.preprocessing import StandardScaler
from scipy.linalg import eigh# Load the Dataset
train = pd.read_csv("data/train.csv")
test = pd.read_csv("data/test.csv")
train.head()

Image by Author

Sample images from the dataset:

Image by Author

2. Preprocess the data

First, we will split the dataset into training and testing sets

y_train = train['label']
X_train = train.drop(['label'], axis=1)
X_test = test
X_train.shape, y_train.shape, X_test.shape#Output
((42000, 784), (42000,), (28000, 784))X_train = X_train/255
X_test = X_test/255

Note that we do not have y_test because the labels were not given for the test set. Next, we will normalize the data which is important in PCA implementation as it projects your original data onto directions which maximize the variance.

3. PCA Implementation

Now, we will implement PCA on our data. First we will standardize our data and calculate our covariance matrix. Covariance can be thought of as the direction of linear relationship between the variables. The covariance matrix is the first step in dimensionality reduction because it gives an idea of the number of features that strongly relate, and it is usually the first step in dimensionality reduction because it gives an idea of the number of strongly related features so that those features can be discarded.

standardized_scalar = StandardScaler()
standardized_data = standardized_scalar.fit_transform(X_train)
standardized_data.shape# Output
(42000, 784)cov_matrix = np.matmul(standardized_data.T, standardized_data)
cov_matrix.shape# Output
(784,784)

Next, we will calculate eigenvalues and eigenvectors. Eigenvectors and eigenvalues of a covariance matrix (or correlation) describe the source of the PCA. Eigenvectors (main components) determine the direction of the new attribute space, and eigenvalues determine its magnitude.

# Calculate eigenvalues and eigenvectors
lambdas, vectors = eigh(cov_matrix, eigvals=(782, 783))
vectors.shape# Output
(784,2)vectors = vectors.T
vectors.shape
# (2, 784)

Then, we will calculate unit vectors and its new coordinates. Unit vectors are vectors with a magnitude of 1, and we are trying to find the unit vector that maximizes the variance.

new_coordinates = np.matmul(vectors, standardized_data.T)
print(new_coordinates.shape)
new_coordinates = np.vstack((new_coordinates, y_train)).T# Output
(2, 42000)df_new = pd.DataFrame(new_coordinates, columns=["f1", "f2", "labels"])
df_new.head()

Image by Author

Let’s see a visualization of the cumulative variance retained across the number of components (784):

Image by Author

4. Create CNN Model

Before we build our CNN model, we will need to take some preprocessing steps. This includes converting the training set into an array, reshaping the input data to get it in the shape which the model expect to receive later, and encoding the image labels.

X_train = np.array(X_train)
y_train = np.array(y_train)X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
print(X_train.shape, y_train.shape)# Output
(42000, 28, 28, 1) (42000,)nclasses = y_train.max() - y_train.min() + 1
y_train = to_categorical(y_train, num_classes = nclasses)
print("Shape of ytrain after encoding: ", y_train.shape)# Output
Shape of ytrain after encoding:  (42000, 10)

Now we can build our CNN model, I built a 2D CNN model with 3 layers and 1 fully connected layer.

input_shape = (28,28,1)
X_input = Input(input_shape)# layer 1
x = Conv2D(64,(3,3),strides=(1,1),name='layer_conv1',padding='same')(X_input)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2,2),name='maxPool1')(x)
# layer 2
x = Conv2D(32,(3,3),strides=(1,1),name='layer_conv2',padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2,2),name='maxPool2')(x)
# layer 3
x = Conv2D(32,(3,3),strides=(1,1),name='conv3',padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2,2), name='maxPool3')(x)
# fc
x = Flatten()(x)
x = Dense(64,activation ='relu',name='fc0')(x)
x = Dropout(0.25)(x)
x = Dense(32,activation ='relu',name='fc1')(x)
x = Dropout(0.25)(x)
x = Dense(10,activation ='softmax',name='fc2')(x)conv_model = Model(inputs=X_input, outputs=x, name='Predict')
conv_model.summary()

After 40 epochs, we achieved an accuracy of 99.8% !

1. Import Libraries and Load the Dataset

2. Preprocess the data

3. PCA Implementation

4. Create CNN Model

Footer