Class notes

Machine learning model

Rating

Sold

Pages

Uploaded on

14-03-2025

Written in

2024/2025

A functional example of a machine learning model, with the python code.

Institution

KCA University

Course

Data Science

Content preview

#Import Libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
from sklearn.model_selection import cross_val_score

Project Summary: KNN Model for Loan Risk Classification

Objective The goal of this project is to predict loan repayment risk using the K-Nearest
Neighbors (KNN) algorithm. Our model classifies loans as either:-

• 1 (Fully Paid - Good Loan )
• 0 (Charged Off - Bad Loan)

We achieve this by analyzing a borrower characteristics and financial indicators, the model helps
measures the likelyhood of loan default which helps the lender in good decison making process.

Steps

1. Data Preprocessing & Feature Selection :- Load the dataset and identify missing
values. Drop irrelevant features.Encoded categorical variables. Converted our target
- loan_status to binary (1 = Fully Paid, 0 = Charged Off). Scaled numerical
features.Select most relevant features.

2. Train-Test Split: - Split data into training (80%) and testing (20%) sets. (ensure
loan_status remains binary:- Just to confirm, i got errors before then decided to
confirm again at this point, even though it might not be necessary).

3. KNN Model Training:- KNN classifier with k=5 to start with. Our Evaluated model
accuracy (93.93%).
We noticed that the Recall for Charged Off loans was low.

4. Optimizing k for Better Performance:- We tuned the value of k (neighbors) using
cross-validation, where we found 3 achieved a higher accuracy. Trained model at
K=3 and our model accuracy of 94% was achieved.

#import data
df=pd.read_csv(r"C:\Users\User\Downloads\loan.csv")
df.head()

C:\Users\User\AppData\Local\Temp\ipykernel_1768\975024387.py:2:
DtypeWarning: Columns (0,45) have mixed types. Specify dtype option on

,import or set low_memory=False.
df=pd.read_csv(r"C:\Users\User\Downloads\loan.csv")

id loan_amnt funded_amnt funded_amnt_inv term
int_rate \
0 NaN 5000.0 5000.0 4975.0 36 months 10.65%

1 NaN 2500.0 2500.0 2500.0 60 months 15.27%

2 NaN 2400.0 2400.0 2400.0 36 months 15.96%

3 NaN 10000.0 10000.0 10000.0 36 months 13.49%

4 NaN 3000.0 3000.0 3000.0 60 months 12.69%

installment grade sub_grade emp_title ... \
0 162.87 B B2 NaN ...
1 59.83 C C4 Ryder ...
2 84.33 C C5 NaN ...
3 339.31 C C1 AIR RESOURCES BOARD ...
4 67.79 B B5 University Medical Group ...

last_credit_pull_d collections_12_mths_ex_med policy_code
application_type \
0 Jul-2017 0.0 1.0
INDIVIDUAL
1 Oct-2016 0.0 1.0
INDIVIDUAL
2 Jun-2017 0.0 1.0
INDIVIDUAL
3 Apr-2016 0.0 1.0
INDIVIDUAL
4 Jan-2017 0.0 1.0
INDIVIDUAL

acc_now_delinq chargeoff_within_12_mths delinq_amnt
pub_rec_bankruptcies \
0 0.0 0.0 0.0
0.0
1 0.0 0.0 0.0
0.0
2 0.0 0.0 0.0
0.0
3 0.0 0.0 0.0
0.0
4 0.0 0.0 0.0
0.0

tax_liens hardship_flag

, 0 0.0 N
1 0.0 N
2 0.0 N
3 0.0 N
4 0.0 N

[5 rows x 56 columns]

#Contents of the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42538 entries, 0 to 42537
Data columns (total 56 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 3 non-null object
1 loan_amnt 42535 non-null float64
2 funded_amnt 42535 non-null float64
3 funded_amnt_inv 42535 non-null float64
4 term 42535 non-null object
5 int_rate 42535 non-null object
6 installment 42535 non-null float64
7 grade 42535 non-null object
8 sub_grade 42535 non-null object
9 emp_title 39909 non-null object
10 emp_length 41423 non-null object
11 home_ownership 42535 non-null object
12 annual_inc 42531 non-null float64
13 verification_status 42535 non-null object
14 issue_d 42535 non-null object
15 loan_status 42535 non-null object
16 pymnt_plan 42535 non-null object
17 desc 29240 non-null object
18 purpose 42535 non-null object
19 title 42522 non-null object
20 zip_code 42535 non-null object
21 addr_state 42535 non-null object
22 dti 42535 non-null float64
23 delinq_2yrs 42506 non-null float64
24 earliest_cr_line 42506 non-null object
25 inq_last_6mths 42506 non-null float64
26 mths_since_last_delinq 15609 non-null float64
27 mths_since_last_record 3651 non-null float64
28 open_acc 42506 non-null float64
29 pub_rec 42506 non-null float64
30 revol_bal 42535 non-null float64
31 revol_util 42445 non-null object
32 total_acc 42506 non-null float64
33 initial_list_status 42535 non-null object

Report Copyright Violation

Written for

Institution: KCA University
Course: Data Science

Document information

Uploaded on: March 14, 2025
Number of pages: 24
Written in: 2024/2025
Type: Class notes
Professor(s): Njenga
Contains: All classes

Subjects

python
coding
tech
machine learning
programming
knn

$8.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

neemaangel

Get to know the seller

neemaangel KCA University

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller neemaangel. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 54481 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Machine learning model

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?