Decision Tree Structure
Nodes = Categories
Branches = Options (split in exactly 2)
Leafs = specific examples from training set
*continuous (such as income) requires choosing a splitting point (such as 80k)
Low Impurity
Skew-data (more useful)
High Impurity
Roughly evenly split data (less useful)
Measure of Node Impurity: Gini Index
Measure of Node Impurity: Entropy
Measure of Node Impurity: Misclassification error
,Comparison among Impurity Measures
Best Attribute for Splitting (based on impurity change)
Before split - weighted sums of after split (Weight = |Dj| / |D|
Dj = j-th partition after split
*Gini improves but error stays the same
Count Matrix
Split Positions are based on 80k example from first slide
Move left to right within the row designated by the impact of the shift looking
at the top
Gain Ratio
Info Gain / Split Info
Decision Tree Pros/Cons
Pros:
- Inexpensive to construct
- Fast
- Can handle independent redundant attributes
Cons:
- Prefers more discriminating
- Only output 0 or 1
Bayes Theorem
, Bayes Classifier
P(Xd | yes) • P(yes) > or < P(Xd | no) • P(no)
Used to estimate probabilities for continuous attributes (without discretization)
Probability Density Estimation
How to handle 0s in Bayes
Coin Flip
Laplace Estimate
Confusion Matrix
Confusion Matrix: Accuracy
(TP+TN)/All
Confusion Matrix: Error
(FP+FN)/All
Confusion Matrix: Precision (= Sensitivity)
TP / TP+FP
*Precision ∝ 1/Recall
Confusion Matrix: Recall
TP / TP+FN
Confusion Matrix: Specificity
TN/TN+FP
Nodes = Categories
Branches = Options (split in exactly 2)
Leafs = specific examples from training set
*continuous (such as income) requires choosing a splitting point (such as 80k)
Low Impurity
Skew-data (more useful)
High Impurity
Roughly evenly split data (less useful)
Measure of Node Impurity: Gini Index
Measure of Node Impurity: Entropy
Measure of Node Impurity: Misclassification error
,Comparison among Impurity Measures
Best Attribute for Splitting (based on impurity change)
Before split - weighted sums of after split (Weight = |Dj| / |D|
Dj = j-th partition after split
*Gini improves but error stays the same
Count Matrix
Split Positions are based on 80k example from first slide
Move left to right within the row designated by the impact of the shift looking
at the top
Gain Ratio
Info Gain / Split Info
Decision Tree Pros/Cons
Pros:
- Inexpensive to construct
- Fast
- Can handle independent redundant attributes
Cons:
- Prefers more discriminating
- Only output 0 or 1
Bayes Theorem
, Bayes Classifier
P(Xd | yes) • P(yes) > or < P(Xd | no) • P(no)
Used to estimate probabilities for continuous attributes (without discretization)
Probability Density Estimation
How to handle 0s in Bayes
Coin Flip
Laplace Estimate
Confusion Matrix
Confusion Matrix: Accuracy
(TP+TN)/All
Confusion Matrix: Error
(FP+FN)/All
Confusion Matrix: Precision (= Sensitivity)
TP / TP+FP
*Precision ∝ 1/Recall
Confusion Matrix: Recall
TP / TP+FN
Confusion Matrix: Specificity
TN/TN+FP