Advanced SAS Certified Data Scientist
Multiple Choice Questions (MCQs) with
Answers and Explanations for Professional
Certification Exams
1. A data scientist is building a predictive model in SAS Viya and observes high variance
with excellent training accuracy but poor validation performance. Which technique is
most appropriate to improve generalization?
A. Increasing model complexity
B. Reducing the training dataset size
C. Eliminating cross-validation
D. Applying regularization techniques
Explanation: Regularization penalizes excessive model complexity, reducing overfitting and
improving performance on unseen data.
2. Which PROC in SAS is specifically designed for fitting logistic regression models?
A. PROC REG
B. PROC GLM
C. PROC MEANS
D. PROC LOGISTIC
Explanation: PROC LOGISTIC is used to model binary and multinomial outcomes and provides
odds ratios, diagnostics, and classification measures.
3. In SAS, what is the primary purpose of the OUTPUT statement within PROC
LOGISTIC?
A. To define categorical variables
B. To terminate execution
C. To specify optimization methods
, D. To create a dataset containing predicted values and residuals
Explanation: The OUTPUT statement stores model predictions, residuals, and influence
statistics for further analysis.
4. A classification model is evaluated using ROC curves. Which metric corresponds to the
area under the ROC curve?
A. Root Mean Squared Error
B. Mean Absolute Error
C. Precision
D. C-statistic
Explanation: The c-statistic is equivalent to the AUC and measures the model's ability to
distinguish between classes.
5. Which SAS procedure is commonly used for decision tree modeling?
A. PROC REG
B. PROC FREQ
C. PROC CORR
D. PROC HPSPLIT
Explanation: PROC HPSPLIT supports tree-based methods, including decision trees and
ensemble techniques.
6. In a highly imbalanced dataset, which evaluation metric is generally more informative
than accuracy?
A. Mean Squared Error
B. R-squared
C. Standard Deviation
D. F1-score
Explanation: The F1-score balances precision and recall, making it suitable when class
distributions are uneven.
Multiple Choice Questions (MCQs) with
Answers and Explanations for Professional
Certification Exams
1. A data scientist is building a predictive model in SAS Viya and observes high variance
with excellent training accuracy but poor validation performance. Which technique is
most appropriate to improve generalization?
A. Increasing model complexity
B. Reducing the training dataset size
C. Eliminating cross-validation
D. Applying regularization techniques
Explanation: Regularization penalizes excessive model complexity, reducing overfitting and
improving performance on unseen data.
2. Which PROC in SAS is specifically designed for fitting logistic regression models?
A. PROC REG
B. PROC GLM
C. PROC MEANS
D. PROC LOGISTIC
Explanation: PROC LOGISTIC is used to model binary and multinomial outcomes and provides
odds ratios, diagnostics, and classification measures.
3. In SAS, what is the primary purpose of the OUTPUT statement within PROC
LOGISTIC?
A. To define categorical variables
B. To terminate execution
C. To specify optimization methods
, D. To create a dataset containing predicted values and residuals
Explanation: The OUTPUT statement stores model predictions, residuals, and influence
statistics for further analysis.
4. A classification model is evaluated using ROC curves. Which metric corresponds to the
area under the ROC curve?
A. Root Mean Squared Error
B. Mean Absolute Error
C. Precision
D. C-statistic
Explanation: The c-statistic is equivalent to the AUC and measures the model's ability to
distinguish between classes.
5. Which SAS procedure is commonly used for decision tree modeling?
A. PROC REG
B. PROC FREQ
C. PROC CORR
D. PROC HPSPLIT
Explanation: PROC HPSPLIT supports tree-based methods, including decision trees and
ensemble techniques.
6. In a highly imbalanced dataset, which evaluation metric is generally more informative
than accuracy?
A. Mean Squared Error
B. R-squared
C. Standard Deviation
D. F1-score
Explanation: The F1-score balances precision and recall, making it suitable when class
distributions are uneven.