DATA MINING
GETTING TO KNOW YOUR
DATA
1
, Highlights
◼ Data Objects and Attribute Types
◼ Basic Statistical Descriptions of Data
◼ Data Visualization
◼ Measuring Data Similarity and Dissimilarity
2
, Types of Datasets
◼ Record
◼ Relational records
◼ Data matrix, e.g., numerical matrix, crosstabs (a
table showing relationship between two or more
variables)
◼ Document data: text documents, term-frequency
vector
◼ Transaction data Document 1 3 0 5 0 2 6 0 2 0 2
◼ Graph and network
◼ World Wide Web Document 2 0 7 0 2 1 0 0 3 0 0
◼ Social or information networks
◼ Molecular Structures Ordered
Document 3 0 1 0 0 1 2 2 0 3 0
TID Items ◼ Video data: sequence of images
1 Bread, Coke, Milk ◼ Temporal data (varies with time): time-series
◼ Sequential Data (sequence matters): transaction sequences
2 Butter, Bread ◼ Genetic sequence data
3 Butter, Coke, Napkin, Milk ◼ Spatial, image and multimedia:
◼ Spatial data: maps
4 Butter, Bread, Napkin, Milk ◼ Image data
5 Coke, Napkin, Milk ◼ Video data
3
, Important Characteristics of Structured Data
◼ Dimensionality
◼ Curse of dimensionality
◼ Sparsity
◼ Only presence counts
◼ Distribution
◼ Centrality and dispersion
4
GETTING TO KNOW YOUR
DATA
1
, Highlights
◼ Data Objects and Attribute Types
◼ Basic Statistical Descriptions of Data
◼ Data Visualization
◼ Measuring Data Similarity and Dissimilarity
2
, Types of Datasets
◼ Record
◼ Relational records
◼ Data matrix, e.g., numerical matrix, crosstabs (a
table showing relationship between two or more
variables)
◼ Document data: text documents, term-frequency
vector
◼ Transaction data Document 1 3 0 5 0 2 6 0 2 0 2
◼ Graph and network
◼ World Wide Web Document 2 0 7 0 2 1 0 0 3 0 0
◼ Social or information networks
◼ Molecular Structures Ordered
Document 3 0 1 0 0 1 2 2 0 3 0
TID Items ◼ Video data: sequence of images
1 Bread, Coke, Milk ◼ Temporal data (varies with time): time-series
◼ Sequential Data (sequence matters): transaction sequences
2 Butter, Bread ◼ Genetic sequence data
3 Butter, Coke, Napkin, Milk ◼ Spatial, image and multimedia:
◼ Spatial data: maps
4 Butter, Bread, Napkin, Milk ◼ Image data
5 Coke, Napkin, Milk ◼ Video data
3
, Important Characteristics of Structured Data
◼ Dimensionality
◼ Curse of dimensionality
◼ Sparsity
◼ Only presence counts
◼ Distribution
◼ Centrality and dispersion
4