MISY 5380 LATEST EXAM PREP QUESTIONS
AND ANSWERS PDF 2026
▶ Association rules? Answer:if-then statements
Convey the likelihood of certain items being purchased together
▶ Antecedent? Answer:The collection of items (or item set) corresponding
to the if portion of the rule
▶ Consequent? Answer:The item set corresponding to the then portion of
the rule
▶ Support count of an item? Answer:Number of transactions in the data
that include that item set
▶ Supervised Learning? Answer:The goal of this learning technique is to
develop a model that predicts a value for a continuous outcome or
classifies a categorical outcome
▶ Partitioning Data? Answer:We can use the abundance of data to guard
against the potential for overfitting by decomposing the data set into three
partitions
the training set
the validation set, and
the test set
▶ Training set? Answer:Consists of the data used to build the candidate
models
▶ Validation set? Answer:The data set to which promising subset of
models is applied to the to identify which model is the most accurate at
predicting when applied to data that were not used to build the model
▶ Test set? Answer:The data set to which the final model should be
applied to estimate this model's effectiveness when applied to data that
have not been used to build or select the model
, ▶ Classification Accuracy? Answer:By counting the classification errors on
a sufficiently large validation set and/or test set that is representative of the
population, we will generate an accurate measure of the model's
classification performance
▶ Classification confusion matrix? Answer:Displays a model's correct and
incorrect classifications
▶ Overall Error Rate? Answer:percentage of misclassified observations
Measure of classification accuracy are based on the classification
confusion matrix
▶ Cutoff Value? Answer:Probability value used to understand the tradeoff
between Class 1 error rate and Class 0 error rate
▶ Cumulative lift chart? Answer:Compares the number of actual Class 1
observations identified if considered in decreasing order of their estimated
probability of being in Class 1 and compares this to the number of actual
Class 1 observations identified if randomly selected
▶ Decile-wise lift chart? Answer:Another way to view how much better a
classifier is at identifying Class 1 observations than random classification.
Observations are ordered in decreasing probability of Class 1 membership
and then considered in 10 equal-sized groups
▶ Prediction Accuracy? Answer:The measures of accuracy are some
function of the error in estimating an outcome for an observation i
▶ k-nearest neighbors? Answer:This method can be used either to classify
an outcome category or predict a continuous outcome.
k-NN uses the k most similar observations from the training set, where
similarity is typically measured with Euclidean distance
▶ Classification and Regression Trees? Answer:Partition a data set of
observations into increasingly smaller and more homogeneous subsets.
At each iteration of the CART method, a subset of observations is split into
two new subsets based on the values of a single variable
▶ Logistic regression? Answer:Attempts to classify a categorical outcome
(y = 0 or 1) as a linear function of explanatory variables
AND ANSWERS PDF 2026
▶ Association rules? Answer:if-then statements
Convey the likelihood of certain items being purchased together
▶ Antecedent? Answer:The collection of items (or item set) corresponding
to the if portion of the rule
▶ Consequent? Answer:The item set corresponding to the then portion of
the rule
▶ Support count of an item? Answer:Number of transactions in the data
that include that item set
▶ Supervised Learning? Answer:The goal of this learning technique is to
develop a model that predicts a value for a continuous outcome or
classifies a categorical outcome
▶ Partitioning Data? Answer:We can use the abundance of data to guard
against the potential for overfitting by decomposing the data set into three
partitions
the training set
the validation set, and
the test set
▶ Training set? Answer:Consists of the data used to build the candidate
models
▶ Validation set? Answer:The data set to which promising subset of
models is applied to the to identify which model is the most accurate at
predicting when applied to data that were not used to build the model
▶ Test set? Answer:The data set to which the final model should be
applied to estimate this model's effectiveness when applied to data that
have not been used to build or select the model
, ▶ Classification Accuracy? Answer:By counting the classification errors on
a sufficiently large validation set and/or test set that is representative of the
population, we will generate an accurate measure of the model's
classification performance
▶ Classification confusion matrix? Answer:Displays a model's correct and
incorrect classifications
▶ Overall Error Rate? Answer:percentage of misclassified observations
Measure of classification accuracy are based on the classification
confusion matrix
▶ Cutoff Value? Answer:Probability value used to understand the tradeoff
between Class 1 error rate and Class 0 error rate
▶ Cumulative lift chart? Answer:Compares the number of actual Class 1
observations identified if considered in decreasing order of their estimated
probability of being in Class 1 and compares this to the number of actual
Class 1 observations identified if randomly selected
▶ Decile-wise lift chart? Answer:Another way to view how much better a
classifier is at identifying Class 1 observations than random classification.
Observations are ordered in decreasing probability of Class 1 membership
and then considered in 10 equal-sized groups
▶ Prediction Accuracy? Answer:The measures of accuracy are some
function of the error in estimating an outcome for an observation i
▶ k-nearest neighbors? Answer:This method can be used either to classify
an outcome category or predict a continuous outcome.
k-NN uses the k most similar observations from the training set, where
similarity is typically measured with Euclidean distance
▶ Classification and Regression Trees? Answer:Partition a data set of
observations into increasingly smaller and more homogeneous subsets.
At each iteration of the CART method, a subset of observations is split into
two new subsets based on the values of a single variable
▶ Logistic regression? Answer:Attempts to classify a categorical outcome
(y = 0 or 1) as a linear function of explanatory variables