Using TPOT to automate machine learning.
If you have tried making machine learning models then you must have known that it is a time taking process in which you will create different models and find the best model out of them, further you need to tune these models in order to get higher accuracy.
What if I tell you that you can find out the best machine learning model for your data in just a few minutes and that too without even writing a lot of code. Yes, you read that right, in this article, I will show you how easily and effortlessly you can test different machine learning models for your data and select the best one.
TPOT stands for Tree-based Pipeline Optimization Tool. It is an open-source python library that is used in automating the machine learning process. It is a tool that optimizes machine learning pipelines using genetic programming.
So let’s get started…….
There are some dependent libraries that are required in order to make TPOT work. We will be installing all of the required libraries. In this article we are using google colab, copy the code given below in order to install all required dependencies.
!pip install TPOT
!pip install dask==2.20.0 dask-glm==0.2.0 dask-ml==1.0.0
!pip install tornado==5.0
!pip install distributed==2.2.0
!pip install xgboost==1.1.0
!pip install fsspec
Now we will be importing all the required libraries which will help us in automating the machine learning process.
from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
Now we will load the famous IRIS dataset and perform some required preprocessing.
iris = load_iris()X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),iris.target.astype(np.float64), train_size=0.75, test_size=0.25, random_state=42)
This is the final step in which we will be creating the machine learning model and find out the best-performing model pipeline with the best parameters.
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42, use_dask=True)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
Here you can clearly see how we used TPOT for finding out the best performing model along with its best parameters.
Go ahead try this and let me know your experiences in the response section.
This article is in collaboration with
.
Thanks for reading! If you want to get in touch with me, feel free to reach me on hmix13@gmail.com or my LinkedIn Profile. You can view my Github profile for different data science projects and packages tutorials. Also, feel free to explore my profile and read different articles I have written related to Data Science.