VERIFIED ANSWERS |2026/2027 UPDATE | A+ GRADED
Cross Validation - Answer -Avoids the problem of data points being limited to testing, training,
validation portions. If certain data is only part of 1 portion, the model cant fit itself because the
testing portion doesn't know it.
Several ways to conduct Cross Validation.
K-fold Cross Validation - Answer -1. 20% testing, 80% training and validation
2. Until all points have been used to train 3 of the models when we compare models to see
which should choose we just take an average of the 4 evaluations
3. K=10 is very common
4. makes good use of the data, better estimates models quality and effective
Clustering - Answer -taking a set of data points and dividing them into groups so each group
contains points that are close to each other or similar ex of uses--targeted marketing, image
analysis. Can help to determine what clusters you did not expect.
Distance Norms - Answer -In mathematics, the Euclidean distance or Euclidean metric is the
"ordinary" straight-line distance between two points in Euclidean space. With this distance,
Euclidean space becomes a metric space. The associated norm is called the Euclidean norm.
1|Page
, Rectilinear Distance - Answer -the distance between two points with a series of 90-degree
turns, as along city blocks
Pnorm Distance - Answer -a generalization of Euclidian and Rectilinear Distance. used in space
with more dimensions.
Infinity Norm - Answer -used for measurement with 2 simultaneous movements ex. the crane
that moves boxes along an aisle and moves across and up/down at same time.
K means clustering - Answer -Informally, goal is to find groups of points that are close to each
other but far from points in other groups
• Each cluster is defined entirely and only by its centre, or mean value µk
Classified as a Expectation Maximization Algorithm
Steps in k-Means clustering - Answer -1. Decide how many clusters we want the algorithm to
give us- test for different values of k
2. Temporarily assign each data point to the cluster center it is closest to-calculating the clusters
involves taking the mean of all data points within the cluster
3. Recalculate those cluster centers that truly may not be the center-finding the centroid
4. Now that centroids are found, must reassign points to correct clusters
5. Find new cluster centers and reassign data points, repeat until no new data points need to be
reassigned and centroids added
2|Page