Loan Defaulter Risk Prediction

Built a Logistic Regression model that predicts whether a given candidate is prone to defaulting their loan or not

Image from https://www.reduceloans.com.au/wp-content/uploads/2020/10/Split-loans-1400×660.jpg

Loan lending has been an important part of our daily lives. Although it is quite profitable to both the lenders and the borrowers. There is a huge dilemma that the lenders at the time of sanctioning especially if the borrowers have insufficient or non-existent credit histories. If the borrower defaults a loan it leads to a loss for the lender. So, knowing if the borrower will be able to pay back their debts or will default on it is a very crucial problem for all lending organizations.
In this article we shall build a Logistic Regression model that predicts whether a given candidate will default on their loan:

This model is trained upon the Credit risk dataset provided by Murilão on kaggle.

Preparing the data:

1.First I use the pandas library to read the CSV file:

# Loading CSV File
data = pd.read_csv('original.csv')
# Dropping rows with NaN values
input = data.dropna()

2.Then I deleted columns with NaN values in them

3. Then I split the dataset into test and train values respectively using the “train_test_split” from sklearn.

# Splitting Dataset into train and test
y = input['default']
X = input.drop(columns=['default'])X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Creating the model:

In Model.py I have created two functions:

train(): This function is called to train the model on a given dataset

#Function to train model
def train(X_train, y_train):
model = LogisticRegression()
model.fit(X_train, y_train)
return model

We use Logistics Regression in this example. Logistic Regression is a very famous algorithm in classification problems. We can implement Logistics Regression easily and get satisfactory results with a few line of code. Logistics regression is similar to linear regression with just the function being different.

2. predict(): This function is used to make predictions on given data

#Fuction for prediction
def predict(model, X_test):
return model.predict(X_test)

Training the model:

Once the data is split into train and test components respectively we call the train function from the model.py file to train/fit the model on X_train and y_train data.

# Training the model
model = train(X_train, y_train)

Prediction:

Upon completion of training we pass the test values to the predict function of the model.py file to test how the predictions

# making Prediction
pred = predict(model, X_test)
print(pred)

Performance Metrics:

To measure the performance we look at the classification_report and the confusion_matrix:

# Printing perfomance Metrics
print(classification_report(y_test, pred))
print('===================================')
print(confusion_matrix(y_test, pred))            precision    recall  f1-score   support      0       0.94      0.99      0.96       350
1       0.87      0.52      0.65        50accuracy                           0.93       400
macro avg       0.90      0.75      0.81       400
weighted avg       0.93      0.93      0.92       400===================================
[[346   4]
[ 24  26]]Process finished with exit code 0

As you can see with this dataset we got 93% accuracy in predicting whether a given individual will default the loan or not. You can find the code on Github.

Footer