A Simple Regression Model Analysis with Tensorflow/Deep Learning Project

Develop a basic data science project with Tensorflow

Today I found myself learning new skills using Tensorflow and I thought about how I can make a project with this helpful open-source library. When I learned something, I want to work with real datasets because this is the best way to consolidate what I learned.

Photo by Joshua Fuller on Unsplash

1- Finding real dataset

Many platforms nowadays share free data sets but now I’ll talk about the most popular one, Kaggle. You can find a wide variety of data in Kaggle. If you want to check, just click the link in the first word Kaggle. If you are interested in machine learning or deep learning you should absolutely know this website.

In this project, I choose Amazon’s Top 50 Bestselling Books between 2009 and 2019 dataset.

Photo by Alfons Morales on Unsplash

2-Advantages and Definition of Tensorflow

Before Tensorflow’s advantages, I will give you a quick description of Tensorflow.
Actively, Tensorflow is the most widely used deep learning framework all around the world. It is a free and open-source software library used for data flow, differentiable programming across a range of tasks, and train ML models. Using Tensorflow is very simple as you will see in the example below. We will examine how it works step by step in a more detailed example.

Unlike traditional digital libraries, TensorFlow uses Data Flow Graph, a common programming model in cloud computing and machine learning, to express and organize the computational workflow, and then map the mathematical operations in the graph to different computing devices. (e.g. GPUs, TPUs, and CPUs).

The architecture of TensorFlow from here

This architecture provides a uniform API to make low-level modules and devices transparent to users; This not only saves us from the tedious and demanding tasks of parallel programming but also makes it possible to move the application from one computing platform to another with virtually no change.

Let’s look at the main advantages:

Quick Model Creation
Scalability
Robust Machine Learning Generation
Pipelining
Community Support etc.

3.1-We have to bring the necessary libraries and extensions

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn

Now, we can read, and control our data set

df = pd.read_excel("bestsellers.xlsx")
df.head()

df.describe()

3.2-Visualization

I saw that my data was exist and available. After that I will use Seaborn. Seaborn is an amazing visualization library for statistical graphics plotting in Python. It provides beautiful default styles and color palettes to make statistical plots more attractive. It is built on the top of matplotlib library and also closely integrated to the data structures from pandas.

sbn.countplot(df["Price"])

plt.figure(figsize=(7,5))
sbn.distplot(df["Price"])

sbn.scatterplot(x="Reviews",y="Price",data=df)

If we examine the last chart, there is no relationship between the reviews and the price.

In the above examples, we see how to plot avarage prices with Seaborn and as you can see it is very easy and quick.

3.3- Training and Testing Data

Photo by Chris Liverani on Unsplash

Dataset splitting with the Sklearn train_test_split function

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=10)

Sklearn (or Scikit-learn) is a Python library that offers various features for data processing that can be used for classification, clustering, and model selection.

Model_selection is a method for setting a blueprint to analyze data and then using it to measure new data. Selecting a proper model allows you to generate accurate results when making a prediction.

To do that, you need to train your model by using a specific dataset. Then, you test the model against another dataset.

The train_test_split function is for splitting a single dataset for two different purposes: training and testing. The testing subset is for building your model. The testing subset is for using the model on unknown data to evaluate the performance of the model.

len(x_train)
output: 109
len(x_test)
output: 47

I checked my x_train and x_test correction, there is no problem and then I passed preprocessing step with MinMaxScaler

from sklearn.preprocessing import MinMaxScaler

Transform features by scaling each feature to a given range.

scaler = MinMaxScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

Now it is time to create my model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

You can create a Sequential model by passing a list of layers to the Sequential constructor:

model = keras.Sequential(
[
layers.Dense(2, activation="relu"),
layers.Dense(3, activation="relu"),
layers.Dense(4),
]
)

You can also create a Sequential model incrementally via the add() method:
I choose the add() method for myself.

model = Sequential()model.add(Dense(12,activation="relu"))
model.add(Dense(12,activation="relu"))
model.add(Dense(12,activation="relu"))
model.add(Dense(12,activation="relu"))
model.add(Dense(1))# pass optimizer by name: default parameters will be used
model.compile(optimizer="adam",loss="mse")

An optimizer is one of the two arguments required for compiling a Keras model

Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data.
Optimizer: str (name of optimizer) or optimizer object
Loss: str (name of objective function) or objective function

3.4-Train the model for a fixed number of epochs

model.fit(x=x_train, y = y_train,validation_data=(x_test,y_test),batch_size=250,epochs=300)

batch_size: int. Number of samples per gradient update
validation_data: tuple (X, y) to be used as held-out validation data. Will override validation_split
nb_epoch: integer, total number of iterations on the data

lossData = pd.df(model.history.history)
lossData.plot()

Returns a history object. Its history attribute is a record of training loss values at successive epochs, as well as validation loss values (if applicable)

3.5-Prediction Series Plotting

from sklearn.metrics import mean_squared_error, mean_absolute_error

mean_absolute_error: Mean absolute error regression loss
mean_squared_error: Mean squared error regression loss

pred = model.predict(x_test)
plt.scatter(y_test,pred)
plt.plot(y_test,y_test,"g-*")

A scatter plot is a diagram where each value in the data set is represented by a dot

The Matplotlib module has a method for drawing scatter plots, it needs two arrays of the same length, one for the values of the x-axis, and one for the values of the y-axis:
The X array represents the y_test in our code
The Y array represents the book price prediction

In this basic project, I discovered how I can make classification and regression predictions with Tensorflow and a machine learning model in the scikit-learn Python library.