Business analytics is about understanding and developing new insights of business performance →
we do this by leveraging data and statistical methods.
Data mining the name given to a variety of computer intensive techniques for discovering structure
and finding patterns in data. Can be used to find classifications etc.
Cross Industry Standard Process (CRISP): a framework to identify subproblems. Useful and easy
to understand codification of the data mining process. It is an iterative process. It has different phases:
● Business understanding phase & Data understanding phase (important to talk to
stakeholders)
○ You go back and forth (iteration) between these two phases before moving on to data
preparation.
● Data preparation phase (also important to talk to stakeholders) & Modelling phase
○ Imputing data (imputing missing variables or converting data types)
○ You go back and forth (iteration) between these two phases before moving on to data
preparation. This happens because you change models based on new insights.
● Evaluation phase
○ stakeholders
○ model metrics
● Deployment phase
Week 1 Video 2 Preliminaries I
N (rows), for example 50 student grade → 50 entries
M (columns), for example 4 factors→ 4 columns
Input variables X (predictors, features, independent variables, variables)
Output variables Y (response variable, dependent variable, response)
Week 1 Video 3 Preliminaries II
Float: a number with decimals (7.9 etc)
Integer: whole numbers (7 etc. )
Week 1 Video 4 Prediction vs Inference I
Y = f(X) + ϵ (ϵ is called epsilon)
To predict Y we use +ϵ
Y Ŷ E (Y - Ŷ)²
9 10 1
7 8 1
6 3 9
Amount of variance associated with the error term
, Week 1 Video 5 Prediction vs Inference II
It is very important that the method that you choose is aligned with the question that you are trying to
answer.
Prediction → doesn't matter how complex a method is, only accuracy is important.
Inference → If you are looking for inference then the inner working of the model are much more
important, so much more about restrictive/flexible
Week 2 Video 1 Clustering Methods
| X1 |X2| X3 | X4 ← Partitions
X = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Characteristics of partitions:
1) Do not need to be of even size → length of total | X | = 10
2) X1 ∪ X2 ∪ X3 ∪ X4 = X (U means union, or putting X partitions together)
○ So the union of all partitions give us the whole set
3) X1 ∩ X2 = ø ( ∩ means intersection, ø means empty set)
n
∪
i=1
xi = union of all subpartitions
n n
∪
i=1
X OR
xi = ∩ xi = ø
i=1
Week 2 Video 2 K-Means Clustering I
Let’s say we have 3 factors:
U = [1, 0, 0]
V = [1, 1, 0]
W = [1, 0, 1]
But how do you get the distance between U (red dot) and W (green dot)?
c = √a2 + b2
U = [1, 0, 0]
W = [1, 0, 1]
d(U, W) = [(1-1)^2 + (0-0)^2 + (0-1)^2] = 1