BADM 211. Test #2
Data mining adds __________ to data visualization and exploratory analyses. - answer
machine learning models
Machine learning refers to algorithms that __________________. - answer learn
directly from data
The conditional probability that event A will occur given that event B has already
occurred may be written as __________________. - answerP(A ꓲ B)
Which of the following is NOT an example of classification? - answer Prediction of
someone's income
Which of the following is an example of prediction? - answer Forecasting sales
Oversampling rare events is a method used to address _____________ situations. -
answerClass imbalance
If a categorical variable is ordinal, it may be coded as ____________. - answer integer
values
Normalizing data is usually accomplished through one of two ways:
1) compute the Z-score of the variable,
and
2) ______________. - answerRescale the variable to a uniform range
The three most effective basic plots are _________________. - answerbar charts, line
graphs, scatter plots
Dummy coding categorical variables can greatly ___________ of the dataset. -
answerinflate the dimension
In k-fold cross-validation, the model will be fit _________ times before computing an
average error measure. - answer5
12) Victoria from Operations has asked you to build a machine learning model to predict
the mean-time-before-failure for an industrial robot. She has provided you with a dataset
that contains 20 predictors, but you can choose the number of observations. Given that
the model will utilize 10-fold cross-validation, you'll need a minimum of __________
samples for minimum predictive accuracy. - answer10 * 20 = 200
200 = 90% * x
200 = .9x
, 222.222
Of regression performance measures, __________ is signed thus giving an indication of
average over- or under-predicting the response variable. - answerMean Error
Look closely at the picture. Given the information provided about this residual
distribution, you should choose ___________ as the regression error performance
measure. - answerResiduals are NOT normally Distributed (left skewed) -> RMSE (root
mean square error)
The naive benchmark is the average of ____________ in the ___________ set. -
answeroutcomes values; training set
Given the information provided about a response variable (see pix), the naive
benchmark would be computed as the ________ error of using __________ as a fixed
value for Price. - answerRegression; mean [average (y-bar)]
The naive rule for classification involves classifying all records as members of
____________. - answerthe majority class (the most prevalent class)
If the classes are well-separated, then a ___________ will exhibit good performance. -
answersmall dataset
If we have two classes, YES and NO, and if the YES class is more important (such as
fraud), then sensitivity is the ability of the classifier to accurately identify the
____________. - answerpercentage of the YES members correctly
If we have two classes, YES and NO, and if the YES class is more important (such as
fraud), then the false omission rate is the proportion of ________ predictions that are
wrong. - answerthe NO
(we thought it was going to be a NO, but it was a YES & we falsely omitted that data)
In regression, prediction error is computed as _____________. - answerthe difference
between actual outcome value & predicted outcome value
eᵢ = yᵢ - ŷᵢ
The first step in creating a gain or lift chart is to ______________. - answer-sort the set
of records in descending order (high to low) by propensity
-propensity: the probability of class membership (the probability that a record belongs to
each of the classes), when the outcome variable is categorical [e.g. the propensity to
default]
The naive rule for classification is to classify all records as part of the __________. -
answerthe majority class (the most prevalent class)
Data mining adds __________ to data visualization and exploratory analyses. - answer
machine learning models
Machine learning refers to algorithms that __________________. - answer learn
directly from data
The conditional probability that event A will occur given that event B has already
occurred may be written as __________________. - answerP(A ꓲ B)
Which of the following is NOT an example of classification? - answer Prediction of
someone's income
Which of the following is an example of prediction? - answer Forecasting sales
Oversampling rare events is a method used to address _____________ situations. -
answerClass imbalance
If a categorical variable is ordinal, it may be coded as ____________. - answer integer
values
Normalizing data is usually accomplished through one of two ways:
1) compute the Z-score of the variable,
and
2) ______________. - answerRescale the variable to a uniform range
The three most effective basic plots are _________________. - answerbar charts, line
graphs, scatter plots
Dummy coding categorical variables can greatly ___________ of the dataset. -
answerinflate the dimension
In k-fold cross-validation, the model will be fit _________ times before computing an
average error measure. - answer5
12) Victoria from Operations has asked you to build a machine learning model to predict
the mean-time-before-failure for an industrial robot. She has provided you with a dataset
that contains 20 predictors, but you can choose the number of observations. Given that
the model will utilize 10-fold cross-validation, you'll need a minimum of __________
samples for minimum predictive accuracy. - answer10 * 20 = 200
200 = 90% * x
200 = .9x
, 222.222
Of regression performance measures, __________ is signed thus giving an indication of
average over- or under-predicting the response variable. - answerMean Error
Look closely at the picture. Given the information provided about this residual
distribution, you should choose ___________ as the regression error performance
measure. - answerResiduals are NOT normally Distributed (left skewed) -> RMSE (root
mean square error)
The naive benchmark is the average of ____________ in the ___________ set. -
answeroutcomes values; training set
Given the information provided about a response variable (see pix), the naive
benchmark would be computed as the ________ error of using __________ as a fixed
value for Price. - answerRegression; mean [average (y-bar)]
The naive rule for classification involves classifying all records as members of
____________. - answerthe majority class (the most prevalent class)
If the classes are well-separated, then a ___________ will exhibit good performance. -
answersmall dataset
If we have two classes, YES and NO, and if the YES class is more important (such as
fraud), then sensitivity is the ability of the classifier to accurately identify the
____________. - answerpercentage of the YES members correctly
If we have two classes, YES and NO, and if the YES class is more important (such as
fraud), then the false omission rate is the proportion of ________ predictions that are
wrong. - answerthe NO
(we thought it was going to be a NO, but it was a YES & we falsely omitted that data)
In regression, prediction error is computed as _____________. - answerthe difference
between actual outcome value & predicted outcome value
eᵢ = yᵢ - ŷᵢ
The first step in creating a gain or lift chart is to ______________. - answer-sort the set
of records in descending order (high to low) by propensity
-propensity: the probability of class membership (the probability that a record belongs to
each of the classes), when the outcome variable is categorical [e.g. the propensity to
default]
The naive rule for classification is to classify all records as part of the __________. -
answerthe majority class (the most prevalent class)