Image Classification

Image classification model analyzes an image and identifies the ‘class’ the image falls under. (Or a probability of the image being part of a ‘class’). A class is essentially a label, for instance, ‘car’, ‘animal’, ‘building’, and so on.

Applications

Automated Image Organization, Backbone for advanced tasks like object detection, pose estimation, action recognition etc.

Scope

Multiclass and Multilabel classification

Tools

TorchVision, TFHub

ResNet

Deep Residual Learning for Image Recognition. ICLR, 2016.

A very popular model that is often used as a backbone CNN to extract visual representations. It achieves a Top 1 accuracy of 76.1 on ImageNet (1000 categories).

MobileNet

Searching for MobileNetV3. ICCV, 2019.

A lean mobile network that achieves an accuracy of 76.0 on ImageNet (1000 categories).

EfficientNet

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML, 2019.

It achieves a Top 1 accuracy of 81.3 on ImageNet (1000 categories).

BiT

Big Transfer (BiT): General Visual Representation Learning. arXiv, 2020.

It achieves a Top 1 accuracy of 85.4 on ImageNet (1000 categories).

ViT

An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. ICLR, 2021.

It showed that the reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. This model achieves a Top 1 accuracy of 87.8 on ImageNet (1000 categories).

Step 1: Create the training data

Capture the images via camera, scrap from the internet or use public datasets like Kaggle, UCI. For labeling, we can either do it manually or outsource via Amazon Mechanical Turk. Setup the database connection and fetch the data into python environment.

Step 2: Prepare the data

Explore the data, validate it and create preprocessing strategy. Clean the data and make it ready for modeling.

Step 3: Train the model

Create the model architecture in python and perform a sanity check. Start the training process and track the progress and experiments. Validate the final set of models and select/assemble the final model.

Step 4: Test and Deliver

Wrap the model inference engine in API for client testing. Deploy the model on cloud or edge as per the requirement. Prepare the documentation and transfer all assets to the client.

Flower Classification

Gradio App available. Check out this notion.

Traffic Sign Classification

Train a 43-class image classifier from scratch in Keras. This is available as Streamlit App. A tutorial video is also available here on the notion.

STL-10 Object Classification

Fine tune a 10-class classifier in PyTorch. Checkout the notion here.

Plant Disease Classification

Available as a Streamlit App

Brain Tumor Classification

Available as a Streamlit App

TorchVision Pre-trained Classifiers

PyTorch TorchVision provides more than 10 pre-trained image classification model, which can be easily fine-tuned on a custom image dataset. Here I experimented with VGG11, AlexNet, ResNet18, and MobileNetV2.

EfficientNet Fine-tuning

Fine-tune EfficientNet in TF Keras to build a dog classifier. There are 120 classes of dogs. The data is available in TensorFlow datasets. The notion is available here.

BiT Fine-tuning

Fine-tune Big-Transfer few-shot model. This model is available in TFHub. Checkout Colab.

Footer