Data Mining:
Concepts and
Techniques
1
, Chapter 12. Outlier Analysis
◼ Outlier and Outlier Analysis
◼ Outlier Detection Methods
◼ Statistical Approaches
◼ Proximity-Base Approaches
◼ Clustering-Base Approaches
◼ Classification Approaches
◼ Mining Contextual and Collective Outliers
2
,◼ Outlier Detection in High Dimensional Data
◼ Summary
What Are Outliers?
◼ Outlier: A data object that deviates significantly from the normal
objects as if it were generated by a different mechanism
◼ Ex.: Unusual credit card purchase, sports: Michael Jordon, Wayne
Gretzky, ...
◼ Outliers are different from the noise data
◼ Noise is random error or variance in a measured variable
◼ Noise should be removed before outlier detection
◼ Outliers are interesting: It violates the mechanism that generates the
normal data
3
, ◼ Outlier detection vs. novelty detection: early stage, outlier; but later
merged into the model ◼ Applications:
◼ Credit card fraud detection
◼ Telecom fraud detection
◼ Customer segmentation
◼ Medical analysis
Types of
Outliers (I)
◼ Three kinds: global, contextual and collective outliers
◼ Global outlier (or point anomaly) Global Outlier
◼ Object is Og if it significantly deviates from the rest of the data set
◼ Ex. Intrusion detection in computer networks
◼ Issue: Find an appropriate measurement of deviation
◼ Contextual outlier (or conditional outlier)
4
Concepts and
Techniques
1
, Chapter 12. Outlier Analysis
◼ Outlier and Outlier Analysis
◼ Outlier Detection Methods
◼ Statistical Approaches
◼ Proximity-Base Approaches
◼ Clustering-Base Approaches
◼ Classification Approaches
◼ Mining Contextual and Collective Outliers
2
,◼ Outlier Detection in High Dimensional Data
◼ Summary
What Are Outliers?
◼ Outlier: A data object that deviates significantly from the normal
objects as if it were generated by a different mechanism
◼ Ex.: Unusual credit card purchase, sports: Michael Jordon, Wayne
Gretzky, ...
◼ Outliers are different from the noise data
◼ Noise is random error or variance in a measured variable
◼ Noise should be removed before outlier detection
◼ Outliers are interesting: It violates the mechanism that generates the
normal data
3
, ◼ Outlier detection vs. novelty detection: early stage, outlier; but later
merged into the model ◼ Applications:
◼ Credit card fraud detection
◼ Telecom fraud detection
◼ Customer segmentation
◼ Medical analysis
Types of
Outliers (I)
◼ Three kinds: global, contextual and collective outliers
◼ Global outlier (or point anomaly) Global Outlier
◼ Object is Og if it significantly deviates from the rest of the data set
◼ Ex. Intrusion detection in computer networks
◼ Issue: Find an appropriate measurement of deviation
◼ Contextual outlier (or conditional outlier)
4