Modeling
Comprehensive Final Examination (150+
1. Which of the following is the PRIMARY purpose of data preprocessing in analytics modeling?
A) To make the dataset visually appealing
B) To ensure data quality and suitability for modeling
C) To reduce the number of variables to exactly 10
D) To automatically select the best algorithm
✅ Correct Answer✓: B
Rationale: Data preprocessing addresses missing values, outliers, scaling, and encoding to ensure data
quality and model compatibility. It does not guarantee visual appeal (A), arbitrarily limit variables (C),
or auto-select algorithms (D).
2. When handling missing data, which method is MOST appropriate when data is Missing Completely
at Random (MCAR) and the proportion missing is small (<5%)?
A) Delete all rows with any missing values
B) Use multiple imputation
C) Replace with mean/median/mode
D) Build a model to predict missing values
✅ Correct Answer✓: C
Rationale: For MCAR data with minimal missingness, simple imputation (mean for numeric, mode for
categorical) preserves sample size with minimal bias. Listwise deletion (A) wastes data; multiple
imputation (B) and predictive modeling (D) are overkill for small MCAR gaps.
,3. Which scaling method is MOST appropriate when your data contains significant outliers?
A) Min-Max Scaling
B) Z-score Standardization
C) Robust Scaling
D) Decimal Scaling
✅ Correct Answer✓: C
Rationale: Robust scaling uses median and IQR, which are resistant to outliers. Min-Max (A) and Z-
score (B) are sensitive to extreme values; decimal scaling (D) is rarely used and still outlier-sensitive.
4. One-hot encoding is primarily used for:
A) Scaling continuous variables
B) Converting categorical variables with no ordinal relationship into binary features
C) Reducing dimensionality of high-cardinality features
D) Handling missing values in categorical data
✅ Correct Answer✓: B
Rationale: One-hot encoding creates binary columns for each category level when no natural order
exists. It does not scale continuous data (A), can increase dimensionality for high-cardinality features
(C), and doesn't address missingness (D).
5. Which statement about feature selection is TRUE?
A) Feature selection always improves model accuracy
B) Feature selection reduces overfitting by eliminating irrelevant features
C) Feature selection is unnecessary when using regularization
D) Feature selection should only be performed after model training
✅ Correct Answer✓: B
,Rationale: Removing irrelevant/redundant features reduces model complexity and overfitting risk.
Feature selection doesn't guarantee accuracy improvements (A), complements but doesn't replace
regularization (C), and should occur before/during training, not after (D).
6. In the context of data partitioning, what is the PRIMARY purpose of a validation set?
A) To train the final model
B) To evaluate final model performance for reporting
C) To tune hyperparameters and select models during development
D) To replace the test set when data is limited
✅ Correct Answer✓: C
Rationale: The validation set guides hyperparameter tuning and model selection without
contaminating the test set. Training uses the training set (A); final evaluation uses the test set (B);
validation doesn't replace testing (D).
7. Which cross-validation method is MOST appropriate for time series data?
A) k-fold cross-validation with random shuffling
B) Stratified k-fold cross-validation
C) Forward chaining (rolling window) cross-validation
D) Leave-one-out cross-validation
✅ Correct Answer✓: C
Rationale: Time series require temporal ordering preservation; forward chaining respects chronology.
Random shuffling (A, B, D) leaks future information into training, violating time series assumptions.
8. When is stratified sampling MOST beneficial during data partitioning?
A) When the target variable is continuous
B) When dealing with imbalanced classification problems
C) When all features are normally distributed
, D) When using unsupervised learning algorithms
✅ Correct Answer✓: B
Rationale: Stratification maintains class distribution across splits, critical for imbalanced classification.
It's irrelevant for continuous targets (A), feature distributions (C), or unsupervised learning (D).
9. Which of the following is a disadvantage of using too many principal components in PCA?
A) Increased computational efficiency
B) Loss of interpretability
C) Reduced model variance
D) Improved handling of multicollinearity
✅ Correct Answer✓: B
Rationale: More PCs retain more variance but reduce interpretability as components become linear
combinations of many original features. Computational efficiency decreases (A), model variance may
increase (C), and multicollinearity handling isn't the primary concern (D).
10. The curse of dimensionality refers to:
A) Increased model accuracy with more features
B) Problems that arise when analyzing data in high-dimensional spaces
C) The difficulty of visualizing data with more than 3 dimensions
D) The computational cost of data storage
✅ Correct Answer✓: B
Rationale: High dimensionality causes data sparsity, distance metric degradation, and overfitting risks.
It doesn't improve accuracy (A), visualization difficulty (C) is a symptom not the definition, and storage
cost (D) is separate.
11. Which outlier detection method is MOST robust to non-normal distributions?
A) Z-score method (|z| > 3)