Data Mining:
Concepts and
Techniques
1
,Chapter 10. Cluster Analysis: Basic Concepts
and Methods
◼ Cluster Analysis: Basic Concepts
◼ Partitioning Methods
◼ Hierarchical Methods
◼ Density-Based Methods
◼ Grid-Based Methods
◼ Evaluation of Clustering
2
,◼ Summary
What is Cluster Analysis?
◼ Cluster: A collection of data objects
◼ similar (or related) to one another within the same group
◼ dissimilar (or unrelated) to the objects in other groups
◼ Cluster analysis (or clustering, data segmentation, …)
◼ Finding similarities between data according to the
characteristics found in the data and grouping similar data
objects into clusters
3
, ◼ Unsupervised learning: no predefined classes (i.e.,
learning by observations vs. learning by examples:
supervised)
◼ Typical applications
◼ As a stand-alone tool to get insight into data distribution
◼ As a preprocessing step for other algorithms
Clustering for Data Understanding and
Applications
◼ Biology: taxonomy of living things: kingdom, phylum, class, order,
family, genus and species
◼ Information retrieval: document clustering
◼ Land use: Identification of areas of similar land use in an earth
observation database
4
Concepts and
Techniques
1
,Chapter 10. Cluster Analysis: Basic Concepts
and Methods
◼ Cluster Analysis: Basic Concepts
◼ Partitioning Methods
◼ Hierarchical Methods
◼ Density-Based Methods
◼ Grid-Based Methods
◼ Evaluation of Clustering
2
,◼ Summary
What is Cluster Analysis?
◼ Cluster: A collection of data objects
◼ similar (or related) to one another within the same group
◼ dissimilar (or unrelated) to the objects in other groups
◼ Cluster analysis (or clustering, data segmentation, …)
◼ Finding similarities between data according to the
characteristics found in the data and grouping similar data
objects into clusters
3
, ◼ Unsupervised learning: no predefined classes (i.e.,
learning by observations vs. learning by examples:
supervised)
◼ Typical applications
◼ As a stand-alone tool to get insight into data distribution
◼ As a preprocessing step for other algorithms
Clustering for Data Understanding and
Applications
◼ Biology: taxonomy of living things: kingdom, phylum, class, order,
family, genus and species
◼ Information retrieval: document clustering
◼ Land use: Identification of areas of similar land use in an earth
observation database
4