Pre-processing or data preparation of a popular image dataset β CIFAR-100
Recognition of images is a simple task for humans as it is easy for us to distinguish between different features. Somehow our brains are trained unconsciously with different or similar types of images that have helped us distinguish between features (images) without putting much effort into the task. For instance, after seeing a few cats, we can recognize almost every different type of cat we encounter in our life. π± However, machines need a lot of training for feature extraction which becomes a challenge due to high computation cost, memory requirement, and processing power.
In this article, we will discuss the pre-processing of one such use case. So, letβs dive deeper and understand how we can pre-process an image dataset to build a convolutional neural network model. ππΌ
Note:
- I will try to make most of the concepts clear but still, this article assumes a basic understanding of the Convolutional Neural Network (CNN). π
- I have used a jupyter notebook to write my code.
Convolutional Neural Network (CNN) is a class of deep neural networks commonly used to analyze images. A convolutional neural network model can be built to correctly recognize and classify colored images of objects into one of the 100 available classes of the CIFAR-100 dataset.
So, letβs get started. ππ»
CIFAR-100 is a labeled subset of 80 million tiny images dataset where CIFAR stands for Canadian Institute For Advanced Research. The images were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The dataset consists of 60000 colored images (50000 training and 10000 test) of 32 Γ 32 pixels in 100 classes grouped into 20 superclasses. Each image has a fine label (class) and a coarse label (superclass).
The Python version of this dataset can be downloaded from the website of the University of Toronto Computer Science. The downloaded files are Python pickled objects produced using cPickle. Do not worry about this for now. We will go through each step together to use this dataset.
After downloading the dataset from the website, we need to load it into our jupyter notebook. The files obtained from the website are Python pickled objects. The folder structure after unzipping looks like this:
We can see that we have separate train and test files, and a meta file.
Python Pickle or cPickle module can be used to serialize or deserialize objects in Python. Here, I am using Pickle. The load() method of Pickle can be used to read these files and analyze their structure. Read this to know more about pickling.
Pickle needs binary data so we will open files as βrbβ and load it using the pickle load() method with βlatin1β encoding.
Let us first import the libraries which we will use in pre-processing.
import pickle
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pylab import rcParams
%matplotlib inline
import keras
from keras.utils import to_categorical
Here is the code to read these files.
#function to read files present in the Python version of the dataset
def unpickle(file):
with open(file, 'rb') as fo:
myDict = pickle.load(fo, encoding='latin1')
return myDict
Read this to know why we mostly use βlatin1β as the encoding.
Let us now load our training set.
trainData = unpickle('train')#type of items in each file
for item in trainData:
print(item, type(trainData[item]))
The output looks like this:
filenames <class 'list'>
batch_label <class 'str'>
fine_labels <class 'list'>
coarse_labels <class 'list'>
data <class 'numpy.ndarray'>
The training file has the above items in it. The coarse_labels and fine_labels are labels of the images (20, 100 respectively), the data file has the image data in the form of a NumPy array, filenames is a list stating the names of the files, and batch_label is the label of the batch.
Let us check the length of the dataset.
print(len(trainData['data']))
print(len(trainData['data'][0]))
The output looks like this:
50000
3072
So, there are 50,000 images in the training dataset and each image is a 3 channel 32 Γ 32 pixel image (32 Γ 32 Γ 3 = 3072).
Let us have a look at the unique fine labels.
print(np.unique(trainData['fine_labels']))
The output looks like this:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
So, there are 100 different fine labels for the images ranging from 0 to 99.
Let us now have a look at the unique coarse labels.
print(np.unique(trainData['coarse_labels']))
The output looks like this:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
So, there are 20 different coarse labels for the images ranging from 0 to 19.
Let us check what is there in the batch_label file.
print(trainData['batch_label'])
The output looks like this:
training batch 1 of 1
Here we have only one batch, so batch_label is a string stating that.
As we are done with exploring the different files in the training dataset except for the data file itself, let us first unpickle our test dataset and meta file.
testData = unpickle('test')metaData = unpickle('meta')#metaData
print("Fine labels:", metaData['fine_label_names'], "n")
print("Coarse labels:", metaData['coarse_label_names'])
Meta file has a dictionary of fine labels and coarse labels. For clarity, I have printed them separately. Here, is the output.
Fine labels: ['apple', 'aquarium_fish', 'baby', 'bear', 'beaver', 'bed', 'bee', 'beetle', 'bicycle', 'bottle', 'bowl', 'boy', 'bridge', 'bus', 'butterfly', 'camel', 'can', 'castle', 'caterpillar', 'cattle', 'chair', 'chimpanzee', 'clock', 'cloud', 'cockroach', 'couch', 'crab', 'crocodile', 'cup', 'dinosaur', 'dolphin', 'elephant', 'flatfish', 'forest', 'fox', 'girl', 'hamster', 'house', 'kangaroo', 'keyboard', 'lamp', 'lawn_mower', 'leopard', 'lion', 'lizard', 'lobster', 'man', 'maple_tree', 'motorcycle', 'mountain', 'mouse', 'mushroom', 'oak_tree', 'orange', 'orchid', 'otter', 'palm_tree', 'pear', 'pickup_truck', 'pine_tree', 'plain', 'plate', 'poppy', 'porcupine', 'possum', 'rabbit', 'raccoon', 'ray', 'road', 'rocket', 'rose', 'sea', 'seal', 'shark', 'shrew', 'skunk', 'skyscraper', 'snail', 'snake', 'spider', 'squirrel', 'streetcar', 'sunflower', 'sweet_pepper', 'table', 'tank', 'telephone', 'television', 'tiger', 'tractor', 'train', 'trout', 'tulip', 'turtle', 'wardrobe', 'whale', 'willow_tree', 'wolf', 'woman', 'worm'] Coarse labels: ['aquatic_mammals', 'fish', 'flowers', 'food_containers', 'fruit_and_vegetables', 'household_electrical_devices', 'household_furniture', 'insects', 'large_carnivores', 'large_man-made_outdoor_things', 'large_natural_outdoor_scenes', 'large_omnivores_and_herbivores', 'medium_mammals', 'non-insect_invertebrates', 'people', 'reptiles', 'small_mammals', 'trees', 'vehicles_1', 'vehicles_2']
Our task will be to recognize images and provide them fine labels.
Let us now create dataframes using the labels, which will help us in visualization.
#storing coarse labels along with its number code in a dataframe
category = pd.DataFrame(metaData['coarse_label_names'], columns=['SuperClass'])#storing fine labels along with its number code in a dataframe
subCategory = pd.DataFrame(metaData['fine_label_names'], columns=['SubClass'])print(category)
print(subCategory)
A glimpse of the two dataframes:
Let us now look at our data.
X_train = trainData['data']
X_train
The output is a NumPy array.
array([[255, 255, 255, ..., 10, 59, 79],
[255, 253, 253, ..., 253, 253, 255],
[250, 248, 247, ..., 194, 207, 228],
...,
[248, 240, 236, ..., 180, 174, 205],
[156, 151, 151, ..., 114, 107, 126],
[ 31, 30, 31, ..., 72, 69, 67]], dtype=uint8)
In order to perform the task of image recognition and classification, a convolutional neural network has to be built which requires a 4D array as the input. So, the data has to be transformed to acquire that shape.
For instance, the training dataset had 50000 images with the shape (50000, 3072), so we need to transform these images to acquire the following shape using the reshape and transpose operation of NumPy array:
(Number of instances Γ Width Γ Height Γ Depth)
The width, height, and depth are the dimensions of the image where depth is nothing but the number of color channels in the image which is 3 in our case as the images are RGB. The following diagram illustrates the form of 4D input for the convolutional neural network model.
Let us write code for this transformation of images.
#4D array input for building the CNN model using Keras
X_train = X_train.reshape(len(X_train),3,32,32).transpose(0,2,3,1)
#X_train
We are now done with our transformation. Let us create visualizations to see these images.
#generating a random number to display a random image from the dataset along with the label's number and name#setting the figure size
rcParams['figure.figsize'] = 2,2#generating a random number
imageId = np.random.randint(0, len(X_train))#showing the image at that id
plt.imshow(X_train[imageId])#setting display off for the image
plt.axis('off')#displaying the image id
print("Image number selected : {}".format(imageId))#displaying the shape of the image
print("Shape of image : {}".format(X_train[imageId].shape))#displaying the category number
print("Image category number: {}".format(trainData['coarse_labels'][imageId]))#displaying the category name
print("Image category name: {}".format(category.iloc[trainData['coarse_labels'][imageId]][0].capitalize()))#displaying the subcategory number
print("Image subcategory number: {}".format(trainData['fine_labels'][imageId]))#displaying the subcategory name
print("Image subcategory name: {}".format(subCategory.iloc[trainData['fine_labels'][imageId]][0].capitalize()))
The output looks like this:
Let us display some more images.
#16 random images to display at a time along with their true labels#setting the figure size
rcParams['figure.figsize'] = 8,8#number of columns and rows in which images needs to be displayed
num_row = 4
num_col = 4#to get 4 * 4 = 16 images together
imageId = np.random.randint(0, len(X_train), num_row * num_col)#creating subplots
fig, axes = plt.subplots(num_row, num_col)#main title of the plot
plt.suptitle('Images with True Labels', fontsize=18)#displaying images as subplots
for i in range(0, num_row):
for j in range(0, num_col):
k = (i*num_col)+j
axes[i,j].imshow(X_train[imageId[k]])
axes[i,j].set_title(subCategory.iloc[trainData['fine_labels'][imageId[k]]][0].capitalize())
axes[i,j].axis('off')
We can see from the visualization that the quality of images is low and the position of the object in the image varies a lot. It would be difficult to train a model to recognize and classify such images. ππ»
Let us now work on the test dataset.
#transforming the testing dataset
X_test = testData['data']
X_test = X_test.reshape(len(X_test),3,32,32).transpose(0,2,3,1)y_train = trainData['fine_labels']
y_test = testData['fine_labels']
In order to make predictions, the labels of the images have been converted to categorical matrix structure from the existing 1D NumPy array structure.
#number of classes in the dataset
n_classes = 100y_train = to_categorical(y_train, n_classes)
y_test = to_categorical(y_test, n_classes)
We are now done with our pre-processing and we will look at how to build a convolutional neural network model for this dataset in another article.
Here is the link to the GitHub repository which has all this code. Please feel free to use this in your work to train a classic CNN model that can classify images. π