Diabetes Prediction

Diabetes is a serious problem which many people face nowadays and which can lead to other serious health diseases.During the period of Covid-19 we also came to know that the conditions of a diabetic patient is much more critical than a non-diabetic patient.So if we can take help of deep learning to predict the risk of getting diabetic and early prediction of diabetic will help people take care of their health and prevent themselves from getting diabetic.

Introduction to cAInvas
Importing the Dataset
Data Analysis and Data Cleaning
Trainset-TestSet Creation
Model Architecture and Model Training
Introduction to DeepC
Compilation with DeepC

cAInvas is an integrated development platform to create intelligent edge devices.Not only we can train our deep learning model using Tensorflow,Keras or Pytorch, we can also compile our model with its edge compiler called DeepC to deploy our working model on edge devices for production.The Diabetes Prediction model which we are going to talk about, is also developed on cAInvas. All the dependencies which you will be needing for this project are also pre-installed.

cAInvas also offers various other deep learning notebooks in its gallery which one can use for reference or to gain insight about deep learning.It also has GPU support and which makes it the best in its kind.

While working on cAInvas one of its key features is UseCases Gallary.When working on any of its UseCases you don’t have to look for data manually.The data is in the of table and is present as csv format.We will load the dataset through pandas as a dataframe in our workspace.

df = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/diabetes.csv')

To gain information about the data that we are dealing with, we will use the command df.info() and we get the following information.

RangeIndex: 2000 entries, 0 to 1999
Data columns (total 9 columns):
#   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
0   Pregnancies               2000 non-null   int64  
1   Glucose                   2000 non-null   int64  
2   BloodPressure             2000 non-null   int64  
3   SkinThickness             2000 non-null   int64  
4   Insulin                   2000 non-null   int64  
5   BMI                       2000 non-null   float64
6   DiabetesPedigreeFunction  2000 non-null   float64
7   Age                       2000 non-null   int64  
8   Outcome                   2000 non-null   int64  
dtypes: float64(2), int64(7)

Next we will remove the duplicate data, replace zero with NaN, then drop all NaN values from dataframe and again obtain the information.For this we can run the following commands;

df.duplicated().sum()
df.drop_duplicates(inplace=True)columns = ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI']for col in columns:
df[col].replace(0, np.NaN, inplace=True)df.dropna(inplace=True)
df.info()

Next step is to create the train data and test data.For this we will drop the ‘outcome’ column from the dataframe and store it in a variable.For the labels we will use the ‘outcome’ column and store it in another variable.Then we will use the scikit learn’s train-test split module to create the train and test dataset.

X = df.drop('Outcome', axis=1)
X = StandardScaler().fit_transform(X)
y = df['Outcome']X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.20, random_state=0)

After creating the dataset next step is to pass our training data for our Deep Learning model to learn to classify Diabetic and Non-Diabetic Pateients.The model architecture used was:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 8)                 72        
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 72        
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 36        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 5         
=================================================================
Total params: 185
Trainable params: 185
Non-trainable params: 0
_________________________________________________________________

The loss function used was “binary_crossentropy” and optimizer used was “Adam”.For training the model we used Keras API with tensorflow at backend.The model showed good performance achieving a decent accuracy.Here are the training plots for the model:

Footer