Custom Image Dataset using a Python Library
Data is the basic requirement for any data science projects. Dataset can be available in any type depending on the project. Data can be present in the form of audio, video, text, images, etc. A good amount of dataset is required to train a robust machine learning/deep learning model.
Many times we are not able to search for the appropriate image dataset required for a particular project. Searching and downloading images from the web and annotating it manually requires a lot of time and manpower. In this article, you can read how to prepare a custom image dataset using a few lines of python code that can be used to train several deep learning models, using a python library bing_image_downloader.
Bing is a web search engine developed by Microsoft. Bing Image Downloader is a python library that can be used to download the bulk of images from Bing.com using python. It uses an async URL, so it is very fast.
Installation:
The library can be installed using pip
pip install bing-image-downloader
or clone the GitHub repository
git clone https://github.com/gurugaurav/bing_image_downloader
Usage:
Dogs and Cats Classifier is one of the best beginner projects for CNN. The dataset for this is freely available in just a few clicks. Suppose someone wants to work on a CNN project to predict the dog’s breed from its image.
To develop a CNN model that can predict a dog’s breed from its image, firstly you require a dataset having annotated images of different images of dog’s breed.
- List down all the names of dog breeds to download custom images:
breed_names_list =
[
"Affenhuahua", "Afgan Hound", "Akita", "Alaskan Malamute",
"American Bulldog", "Auggie", "Beagle", "Belgian Tervuren",
"Bichon Frise", "Bocker", "Borzoi", "Boxer", "Bugg", "Bulldog"
]
- After listing the names of the dog’s breed use the bing_image_downloader API to download the bulk of images from bing.com.
This is it, the loop will run to download the images for each of the mentioned breeds and the downloaded images will be stored in separate folders annotated with their breed name.
Parameters:
query_string: String keyword to search
limit: Number of images to download for each search keyword
output_dir: Save the downloaded images to this directory
adult_filter_off: Enable or Disable adult images
force_replace: Is folder name already exist, delete it and save it fresh, else save in same folder.
timeout: timeout for connection in seconds.