Built a Logistic Regression model that predicts whether a given candidate is prone to defaulting their loan or not
Loan lending has been an important part of our daily lives. Although it is quite profitable to both the lenders and the borrowers. There is a huge dilemma that the lenders at the time of sanctioning especially if the borrowers have insufficient or non-existent credit histories. If the borrower defaults a loan it leads to a loss for the lender. So, knowing if the borrower will be able to pay back their debts or will default on it is a very crucial problem for all lending organizations.
In this article we shall build a Logistic Regression model that predicts whether a given candidate will default on their loan:
This model is trained upon the Credit risk dataset provided by Murilão on kaggle.
Preparing the data:
1.First I use the pandas library to read the CSV file:
# Loading CSV File
data = pd.read_csv('original.csv')
# Dropping rows with NaN values
input = data.dropna()
2.Then I deleted columns with NaN values in them
3. Then I split the dataset into test and train values respectively using the “train_test_split” from sklearn.
# Splitting Dataset into train and test
y = input['default']
X = input.drop(columns=['default'])X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Creating the model:
In Model.py I have created two functions:
- train(): This function is called to train the model on a given dataset
#Function to train model
def train(X_train, y_train):
model = LogisticRegression()
model.fit(X_train, y_train)
return model
We use Logistics Regression in this example. Logistic Regression is a very famous algorithm in classification problems. We can implement Logistics Regression easily and get satisfactory results with a few line of code. Logistics regression is similar to linear regression with just the function being different.
2. predict(): This function is used to make predictions on given data
#Fuction for prediction
def predict(model, X_test):
return model.predict(X_test)
Training the model:
Once the data is split into train and test components respectively we call the train function from the model.py file to train/fit the model on X_train and y_train data.
# Training the model
model = train(X_train, y_train)
Prediction:
Upon completion of training we pass the test values to the predict function of the model.py file to test how the predictions
# making Prediction
pred = predict(model, X_test)
print(pred)
Performance Metrics:
To measure the performance we look at the classification_report and the confusion_matrix:
# Printing perfomance Metrics
print(classification_report(y_test, pred))
print('===================================')
print(confusion_matrix(y_test, pred)) precision recall f1-score support 0 0.94 0.99 0.96 350
1 0.87 0.52 0.65 50accuracy 0.93 400
macro avg 0.90 0.75 0.81 400
weighted avg 0.93 0.93 0.92 400===================================
[[346 4]
[ 24 26]]Process finished with exit code 0
As you can see with this dataset we got 93% accuracy in predicting whether a given individual will default the loan or not. You can find the code on Github.