Loans have played a significant role in the business world for several years. It is profitable and
beneficial for both the lenders and the borrowers; however, it carries a great risk, which in the
domain of loan lending is referred to as Credit Risk.
Ever since the 2008 financial crisis, financial institutions have been more stringent and careful in
order to protect themselves of hefty loses and bad decisions. The regulators have also become
stricter and require minute details thereby making it imperative for any financial institution to
forecast their recovery/default rates.
Risk prevention is the foremost thing for any business, and the credit risk involved in loan
lending can be prevented using analytics.
BUSINESS PROBLEM
Banks or Loan Financing Companies make money by providing loans and earning interest
income from those loans. The types of loans a commercial bank can issue vary and may include
mortgages, auto loans, business loans, and personal loans. A commercial bank may specialize in
just one or a few types of loans.
For most banks, loans are the primary use of their funds and the principal way in which they earn
income. Loans are usually made for fixed terms, at fixed/floating rates and are typically secured
by mortgage or any valuable assets. While banks will make loans with variable or adjustable
interest rates and borrowers can often repay loans early, with little or no penalty, but there are
incidents where the depositor does not repay the loan willingly or unwillingly, this is known as
loan defaulting.
Loan Defaulting
Defaulting on a loan happens when repayments are not made for a certain period of time. When a
loan defaults, it is sent to a debt collection agency whose job is to contact the borrower and
receive the unpaid funds. Defaulting will drastically reduce one’s credit score, affect his/her
ability to receive future credit, and can lead to the seizure of personal property.
Loan default occurs when a borrower fails to pay back a debt according to the initial
arrangement. In the case of most consumer loans, this means that successive payments have been
missed over the course of weeks or months. Fortunately, lenders and loan servicers usually allow
a grace period before penalizing the borrower after missing one payment. The period between
missing a loan payment and having the loan default is known as delinquency. The delinquency
period gives the debtor time to avoid default by contacting their loan servicer or making up
missed payments.
3|Page
,The consequences of defaulting on a loan of any type are severe and should be avoided at all
costs. It affects the defaulter and also the bank directly, people who have their accounts with the
bank are also affected negatively. So, bank tries to minimize the risk and take calculated
decisions.
Impact of Loan Defaulting on Banks
Usually a bank checks for all necessary personal and income related documents. It tries its best
to make sure that the lender does the repayment on time.
A possible effect of loan defaults is on shareholders earnings. Dividend payments are based
on banks performance in terms of net profit. Thus since loan defaults have an adverse effect on
profitability of banks; it can affect the amount of dividend to be paid to shareholders.
How a bank starts facing losses when borrowers of loan start defautting
Through the model, we plan to predict the
likelihood of a borrower of defaulting the loan
payment, with the available data of the borrower.
This would help in better decision making,
taking calculated risk and avoiding losses to
banks and its customers.
OBJECTIVE
To identify and predict the likelihood of a customer defaulting on the repayment of a loan given
to him, using Multivariate Techniques of Decision Trees and Logistic Regression.
4|Page
, DATASET ANALYSIS
The dataset we chose: loan_final313.csv (30 attributes, 8 lakhs records)
This dataset had 8 lakh records but a sample of the records was considered for this project
keeping in mind the scope. Random function in excel was used to select records randomly
without bias for data reduction.
The current dataset has 26 attributes and 30323 records.
The columns in the dataset include the following attributes:
SR. VARIABLE NAME DESCRIPTION
No.
1 Id unique ID of the loan borrower
2 Year Year when loan was issued
3 issue_d Issue Date of the loan
4 emp_length_int Employment length in years: possible values 0-10 where 0 means
less than one year
5 home_ownership Status of borrower’s home
6 home_ownership_cat Category of borrower’s home
7 income_category Income Level of borrower
8 annual_inc Combined self-reported annual income provided by co-borrowers
during registration
9 income_cat Income category- divided into 3; 1- upto 1,00,000, 2- upto
2,00,000 and 3- beyond 2,00,000
10 loan_amount Amount of loan issued
11 Term Term for which the loan was issued
12 term_cat Category 1: 36 months, Category 2: 60 months
13 Purpose Purpose for loan borrowing
14 purpose_cat Categories of all purposes
15 interest_payments Payments of interest on loan- High or Low
16 interest_payment_cat 1- Low interest payment, 2- High interest payment
17 loan_condition Condition of loan- Good or Bad
18 loan_condition_cat 0- Good loan, 1- Bad loan
19 interest rate Interest rate on borrowed loan
20 Grade Assigned Loan Grade
21 grade_cat Categorized into 6 grades
22 dti A ratio calculated using the borrower’s total monthly debt
payments on the total debt obligations, excluding mortgage and the
requested LC loan, divided by the borrower’s self-reported
monthly income
23 total_pymnt Payment made till date on loan
24 total_rec_prncp Amount of principal recovered
25 Installment Amount of loan repayed
26 Region Region where the customer resides
5|Page