Answers Latest Updated 2025/2026 (Graded
A+) Georgia Institute of Technology.
Section 1: Machine Learning Basics (Questions 1–8)
Q1: Which of the following best describes the goal of supervised learning?
A. To discover hidden patterns in unlabeled data without any target output
B. To learn a mapping from input features to output labels using labeled training
data
C. To reduce the dimensionality of a dataset while preserving maximum variance
D. To cluster similar data points together based on distance metrics
Correct Answer: B
Rationale: Correct because supervised learning explicitly requires labeled data
where the algorithm learns to map inputs to known outputs, as in regression and
classification tasks.
Q2: In the bias-variance tradeoff, which scenario correctly describes underfitting?
A. Low bias, high variance, excellent training performance but poor test
performance
B. High bias, low variance, poor performance on both training and test data
C. Low bias, low variance, excellent performance on both training and test data
D. High bias, high variance, erratic performance across all datasets
Correct Answer: B
Rationale: Correct because underfitting occurs when a model is too simple to
capture the underlying data pattern, resulting in high bias and low variance with
consistently poor performance on both training and test sets.
Q3: A data scientist uses 5-fold cross-validation to evaluate a model. How many
times is each data point used for training, and how many times for validation?
A. Training: 4 times, Validation: 1 time
B. Training: 5 times, Validation: 0 times
, C. Training: 1 time, Validation: 4 times
D. Training: 3 times, Validation: 2 times
Correct Answer: A
Rationale: Correct because in k-fold cross-validation, each fold serves as the
validation set exactly once while the remaining k-1 folds are used for training, so
each point trains 4 times and validates once.
Q4: Which loss function is most robust to outliers in a regression task?
A. Mean Squared Error (MSE)
B. Mean Absolute Error (MAE)
C. Huber Loss
D. Binary Cross-Entropy
Correct Answer: B
Rationale: Correct because MAE computes the average absolute difference,
which is less sensitive to large deviations than MSE since it does not square the
errors, making it more robust to outliers.
Q5: Which of the following is NOT an example of unsupervised learning?
A. K-means clustering for customer segmentation
B. Principal Component Analysis for dimensionality reduction
C. Logistic regression for spam email classification
D. Gaussian Mixture Models for density estimation
Correct Answer: C
Rationale: Correct because logistic regression is a supervised classification
algorithm that requires labeled training data, whereas the other options are
unsupervised methods that operate on unlabeled data.
Q6: A model exhibits very low training error but high validation error. Which of the
following is the most appropriate diagnosis?
A. The model is underfitting and requires increased complexity
B. The model is overfitting and would benefit from regularization
C. The model has high bias and needs more training epochs