and answers 2026\2027 A+ Grade
Support Vector Machine
- correct answer A supervised learning, classification model. Uses extremes, or identified points in the
data from which margin vectors are placed against. The hyperplane between these vectors is the
classifier
SVM Pros/Cons
- correct answer Pros: It works really well with a clear margin of separation
It is effective in high dimensional spaces.
It is effective in cases where the number of dimensions is greater than the number of samples.
It uses a subset of training points in the decision function (called support vectors), so it is also memory
efficient.
Cons: Not good for very large data sets
Not good for when the data set has more noise i.e. target classes are overlapping
Doesn't directly provide probability estimates.
K-nearest neighbor (K-NN)
- correct answer An unsupervised classification algorithm. Looks at the X number of closest points to the
new one and classifies as whichever is most common.
K-nearest neighbor (K-NN) Pros/Cons
- correct answer Pros: No assumptions about data
Easy to understand/Interpret
Varsatile
Cons: Computationally expensive because algorithm stores all training data
Sensitive to irrelevant features and scale of data
, k-fold cross validation
- correct answer Validation Technique where data is divided into X number of data subsets. Each subset
is then used as a for testing while the rest are used for training. The algorithm then rotates through each
subset and averages the results
K Fold cross Validation Pros/Cons
- correct answer Pros: Validates Performance of model
Can create balance across predicted features classes
Cons: Doesn't work well with time series data
The aggregate scores of your model could miss some important extreme values or overpower them so
theyre harder to pick up on
k-means clustering
- correct answer Unsupervised learning heuristic that sets x starts by assigning x number of cluster
centers, then clusters all data points into each of them based on distance. The center point of each
cluster is then calculated and all data points are again re clustered. Repeat process until no-data points
change clusters. Ideal number of clusters can be identified via elbow diagram.
k-means pros and cons
- correct answer Pros: Simple to implement
Scales well to large data sets
Easily adaptable
Cons: Choosing K manually can bias it towards initial values
sensitive to outliers
Grubbs Outlier Test
- correct answer A formula that uses an outlier's value, the mean of the data, and the standard deviation
to determine whether or not the data point is within the confidence interval for a normal distribution or
should be thrown out
CUSUM
- correct answer Change detection model that keeps a running total of the amount that observations