Cassava Leaf Disease Classification with Deep Learning: Part I

The dataset has been acquired from Kaggle which contains 21,397 images in the training dataset and around 15,000 images in the test set. There are two variables in the dataset: Image_id is the image file name and label is the ID for categories of diseases. There are 5 classes of Cassava Leaf Diseases shown below:

Table 1: Cassava Leaf Diseases Labels

Missing Values

After checking the fraction of missing values in each variable and the fraction of points with missing values, it’s clear to see that there is no need to handle missing values.

Table 2: Missing Value Information

Imbalance

From the figure below, there exists an imbalance in this image dataset where “Cassava Mosaic Disease (CMD)” has around 13,158 instances which account for over 61.5% of the whole dataset. However, “Cassava Bacterial Blight (CBB)” has only around 1,087 instances which only account for 5.1% of the whole dataset.

Figure 1: Cassava Leaf Diseases Frequency Bar Plot

Images of Each Class

Next, let’s take a look at different classes of Cassava Leaf Diseases by randomly picking some images.

Figure 2: Cassava Leaf Diseases of Each Class

RGB Information

As we know, computers store images as a mosaic of tiny squares. Pixel, also known as a picture element, is used to extract information from an image. Each pixel is a combination of three colors: Red, Green, and Blue. In an RGB image, Red, Green, and Blue could have 256 different intensity or brightness values as their values are represented by an 8-bit number. From the table below, the RGB information of a random image is shown [1].

Figure 3: a Cassava Leaf Disease Image

Table 3: Basic Properties of an Image

As each pixel of the image is displayed by three distinct integers, the shape of the image is a three-layered matrix, where 0 indicates red channel, 1 indicates green channel and 2 indicates blue channels. From Figure 4 below, we can have a quick view of each channel in this image.

Figure 4: RGB Channels of an Image

In addition, this image is also split into separate color components: Red, Green, and Blue.

Figure 5: Split an Image intro Three Layers

Colored 3D Scatter Plot

From the RGB Information section, we’ve already known that RGB is the most common color in the color space. However, there are so many color spaces that can be used for specific goals. In this blog, RGB and HSV color spaces perform color segmentations as well as visualization of the color distribution for an image shown in Figure 6.

Figure 6: Cassava Leaf

RGB Color Space

In RGB color space, an image will be split with respect to the RBG channels. In this way, each axis in Figure 7 will represent one of the channels in this color space. From the figure below, it’s clear to see parts of green approximately cover the whole plot so that it could be hard to segment this leaf out in RGB space with respect to these RGB values [2].

Figure 7: an Image in RGB Color Space

HSV Color Space

Unlike RGB color space, HSV color space is a cylindrical color space. Images in this color space will be split with respect to Hue, Saturation, and Value (Brightness). The Hue channel is analyzed with respect to an angular dimension. The value channel is the vertical axis in this color space, where smaller values indicate darkness and otherwise. And the third axis is the saturation channel which indicates the shades of hue from least saturated at the vertical axis, to most saturated farthest away from the center [2].

From the figure below, It’s clear to see that the leaf’s greens are much more localized. In other words, it’s easier to separate colors visually. In addition, we can find that although the saturation and value of the greens change, they are mostly located within a specific range with respect to the hue axis. Therefore, it could be less hard for us to do color segmentation or extract some important information [2].

Figure 8: an Image in HSV Color Space

Image Augmentation

Image augmentation is an important technique in image classification projects. This technique enables us to perform various transformations on images in order to expand original datasets, save up on the overhead memory as well as make the model more robust. In this project, the “ImageDataGenerator” has been used to apply different random transformations including rotations, shifts, flips, brightness, zoom, and shear on original images. The reason we chose “ImageDataGenerator” is that it’s able to provide real-time data augmentations in future model training. In this section, we will share some interesting transformations and feel free to check all types of transformations at our GitHub repository.[3]

Random Shifts

Image shift is one augmentation method to change the positions of objects in images. One possible reason to use this is that sometimes the object is not shown properly in the center of an image. In “ImageDataGenerator”, the parameters “width_shift_range” and “height_shift_range” are used to adjust the fraction of total heights as well as total width by adding a certain constant value to all pixels. From the figure below, it’s able to see how shifts work in this image.

Figure 9: Random Shifts

Random Brightness

Image brightness is a great augmentation method to change the brightness of images. One possible reason to use this is that sometimes the object is not shown clearly in some extreme lighting conditions such as darkness. In “ImageDataGenerator”, the parameters “brightness_range” are used to randomly pick a brightness shift value. From the figure below, it’s able to see how brightness works in this image.

Figure 10: Random Brightness

Random Zoom

Image zoom is another great augmentation method to either zooms in or zooms out of images. In “ImageDataGenerator”, the parameter “zoom_range” is used to randomly perform zoom. From the figure below, it’s able to see how zoom works in this image.

Figure 11: Random Zoom

Put All Things Together

Let’s see how combinations of these image augmentation transformation work on this image!

Figure 12: Combined Transformation