CS 7641 - FINAL REVIEW | VERIFIED STUDY GUIDE
Single Linkage Clustering - Answers - Treat each object as a cluster. Merge the closest
two clusters into a single cluster. Repeat n - k times.
Mutual Information - Answers - how much does one variable say about the other
varibale; or the measure of similarity between two variables
Folk Theorem - Answers - Any acceptable outcome can be a nash equilibrium with the
appropriate discount factor
k-means Clustering - Answers - Pick random k cluster centers. Asssociate each point
with the closest cluster center. Recompute centers by averaging, repeat until
convergence. This converges to a local minima, error decreases, but can get stuck.
Best to do repeated attempts.
Soft Clustering - Answers - Assigning probability to points during clustering
Expectation Maximization - Answers - Performs Expecation: Soft Cluserting "Likelihood
data i element comes from cluster j". Maximization: Calculating means of clusters. Can
allow overlaps of clusters, can get stuck, does not converge
Cluster Properties - Answers - Richness, Scale-invariance, Consistency
Richness - Answers - There is some distance metric D such that any clustering can
work
Scale-Invariance - Answers - The cluster assignments do not change if we change the
scale (e.g. miles vs kilometers)
Consistency - Answers - Shrinking/expanding intracluster distances does not change
the clustering
Impossibility Theorem - Answers - No clustering scheme can achieve all three of:
richness, scale-invarance, and consistency.
Filtering - Answers - Reduce Feature set and then apply learning. Is fast, can use things
like variance, entropy, decision trees, etc.. Can look at labels
Wrapping - Answers - Use Learning to reduce feature set. Much slower than filtering.
Examples are Randomized Optimization algos, forward search and backward search
Forward Search - Answers - Enumerate the features and choose the best combination
of features (e.g. use feature 1 and 3 and test, then 1 and 5, then 2 and 4, etc...)
Single Linkage Clustering - Answers - Treat each object as a cluster. Merge the closest
two clusters into a single cluster. Repeat n - k times.
Mutual Information - Answers - how much does one variable say about the other
varibale; or the measure of similarity between two variables
Folk Theorem - Answers - Any acceptable outcome can be a nash equilibrium with the
appropriate discount factor
k-means Clustering - Answers - Pick random k cluster centers. Asssociate each point
with the closest cluster center. Recompute centers by averaging, repeat until
convergence. This converges to a local minima, error decreases, but can get stuck.
Best to do repeated attempts.
Soft Clustering - Answers - Assigning probability to points during clustering
Expectation Maximization - Answers - Performs Expecation: Soft Cluserting "Likelihood
data i element comes from cluster j". Maximization: Calculating means of clusters. Can
allow overlaps of clusters, can get stuck, does not converge
Cluster Properties - Answers - Richness, Scale-invariance, Consistency
Richness - Answers - There is some distance metric D such that any clustering can
work
Scale-Invariance - Answers - The cluster assignments do not change if we change the
scale (e.g. miles vs kilometers)
Consistency - Answers - Shrinking/expanding intracluster distances does not change
the clustering
Impossibility Theorem - Answers - No clustering scheme can achieve all three of:
richness, scale-invarance, and consistency.
Filtering - Answers - Reduce Feature set and then apply learning. Is fast, can use things
like variance, entropy, decision trees, etc.. Can look at labels
Wrapping - Answers - Use Learning to reduce feature set. Much slower than filtering.
Examples are Randomized Optimization algos, forward search and backward search
Forward Search - Answers - Enumerate the features and choose the best combination
of features (e.g. use feature 1 and 3 and test, then 1 and 5, then 2 and 4, etc...)