Image classification model analyzes an image and identifies the ‘class’ the image falls under. (Or a probability of the image being part of a ‘class’). A class is essentially a label, for instance, ‘car’, ‘animal’, ‘building’, and so on.
Applications
Automated Image Organization, Backbone for advanced tasks like object detection, pose estimation, action recognition etc.
Scope
Multiclass and Multilabel classification
Tools
TorchVision, TFHub
ResNet
Deep Residual Learning for Image Recognition. ICLR, 2016.
A very popular model that is often used as a backbone CNN to extract visual representations. It achieves a Top 1 accuracy of 76.1 on ImageNet (1000 categories).
MobileNet
Searching for MobileNetV3. ICCV, 2019.
A lean mobile network that achieves an accuracy of 76.0 on ImageNet (1000 categories).
EfficientNet
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML, 2019.
It achieves a Top 1 accuracy of 81.3 on ImageNet (1000 categories).
BiT
Big Transfer (BiT): General Visual Representation Learning. arXiv, 2020.
It achieves a Top 1 accuracy of 85.4 on ImageNet (1000 categories).
ViT
An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. ICLR, 2021.
It showed that the reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. This model achieves a Top 1 accuracy of 87.8 on ImageNet (1000 categories).
Step 1: Create the training data
Capture the images via camera, scrap from the internet or use public datasets like Kaggle, UCI. For labeling, we can either do it manually or outsource via Amazon Mechanical Turk. Setup the database connection and fetch the data into python environment.
Step 2: Prepare the data
Explore the data, validate it and create preprocessing strategy. Clean the data and make it ready for modeling.
Step 3: Train the model
Create the model architecture in python and perform a sanity check. Start the training process and track the progress and experiments. Validate the final set of models and select/assemble the final model.
Step 4: Test and Deliver
Wrap the model inference engine in API for client testing. Deploy the model on cloud or edge as per the requirement. Prepare the documentation and transfer all assets to the client.
Flower Classification
Gradio App available. Check out this notion.
Traffic Sign Classification
Train a 43-class image classifier from scratch in Keras. This is available as Streamlit App. A tutorial video is also available here on the notion.
STL-10 Object Classification
Fine tune a 10-class classifier in PyTorch. Checkout the notion here.
Plant Disease Classification
Available as a Streamlit App
Brain Tumor Classification
Available as a Streamlit App
TorchVision Pre-trained Classifiers
PyTorch TorchVision provides more than 10 pre-trained image classification model, which can be easily fine-tuned on a custom image dataset. Here I experimented with VGG11, AlexNet, ResNet18, and MobileNetV2.
EfficientNet Fine-tuning
Fine-tune EfficientNet in TF Keras to build a dog classifier. There are 120 classes of dogs. The data is available in TensorFlow datasets. The notion is available here.
BiT Fine-tuning
Fine-tune Big-Transfer few-shot model. This model is available in TFHub. Checkout Colab.