OF TECHNOLOGY
What are the steps of k-means clustering? - 1. Pick k cluster centers within range of data
2. Assign each data point to nearest cluster center
3. Recalculate cluster centers (centroids)
Repeat steps 2 & 3 until we get to a step where no data point changes clusters.
What is a heuristic algorithm? - An algorithm that's not guaranteed to find the absolute best
solution, but in many cases, it usually gets very close to the best solution and quickly.
Why is the k-means algorithm considered an expectation-maximization algorithm? - Every time
the algorithm repeats steps 2 & 3, it is maximizing the negative of the distance to the cluster
center; it takes an expectation, maximizes, expectation, maximizes, etc.
When using k-means clustering, how do we know how many clusters k to use? - 1. Should be
appropriate for the real-life situation
2. Look at the elbow graph and find the kink
What's the difference between classification models and clustering models? After all, both
involve grouping data points - In classification models, we already know the response. In
clustering models, we do not know up front the right grouping of the data points. The model has
to decide how to cluster solely on the attributes.
What is supervised learning? - When the algorithm knows the correct response for each data
point, and that information is used to train the data
What is unsupervised learning? - When the algorithm does not know the correct response for
each data point and must only use the data's attributes