ISTE 600-RIT-Dubai Session 2022
Foundations of Data Mining
Quiz 2
Practical Applications of Learning Machines
Student’s Name: Score:
Exercise 1: (20 points)
An expert working on finding the most optimal clustering of the data via the ubiquitous kMeans
clustering algorithm has found the following total within cluster sum of squares (TWCSS):
1 2 3 4 5 6 7 8
TWCSS 244373.87 89337.83 51063.48 49512.16 12338.52 10882.62 8616.20 10305.54
1. Compute the minimum number of clusters for an expert who needs to explains at least
80% of the variation capture and explained by the kMeans clustering machinery.
2. Construct a sketch of the scree plot of percentage of variation explained by this kMeans
clustering machinery.
Foundations of Data Mining 1
, ISTE 600-RIT-Dubai Session 2022
Exercise 2: (30 points)
The following table provides the values of the idiosyncratic (individual) variances of each of the
components in a principal component analysis learning task conducted by a data scientist.
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
λ 291.79 42.02 14.62 5.42 2.35 1.01 0.40 0.06 0.009
1. What is the estimated number of components in the latent space for an investigator
seeking to explain at least 90% of the variation in the data.
2. Construct a sketch of the scree plot of this Principal Component Analysis learning task
3. Comment extensively on how this plot relates to your answer to the first question.
Foundations of Data Mining 2
Foundations of Data Mining
Quiz 2
Practical Applications of Learning Machines
Student’s Name: Score:
Exercise 1: (20 points)
An expert working on finding the most optimal clustering of the data via the ubiquitous kMeans
clustering algorithm has found the following total within cluster sum of squares (TWCSS):
1 2 3 4 5 6 7 8
TWCSS 244373.87 89337.83 51063.48 49512.16 12338.52 10882.62 8616.20 10305.54
1. Compute the minimum number of clusters for an expert who needs to explains at least
80% of the variation capture and explained by the kMeans clustering machinery.
2. Construct a sketch of the scree plot of percentage of variation explained by this kMeans
clustering machinery.
Foundations of Data Mining 1
, ISTE 600-RIT-Dubai Session 2022
Exercise 2: (30 points)
The following table provides the values of the idiosyncratic (individual) variances of each of the
components in a principal component analysis learning task conducted by a data scientist.
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
λ 291.79 42.02 14.62 5.42 2.35 1.01 0.40 0.06 0.009
1. What is the estimated number of components in the latent space for an investigator
seeking to explain at least 90% of the variation in the data.
2. Construct a sketch of the scree plot of this Principal Component Analysis learning task
3. Comment extensively on how this plot relates to your answer to the first question.
Foundations of Data Mining 2