BADM 211 Exam 1
What does business analytics mean? - answer the science of transforming data into
insights and models that lead to better decisions and add value to individuals,
companies and societies.
What is machine learning? - answer algorithms that can learn rules and associations
from the data; artificial intelligence
What are the Four V's of Big Data? - answer1. Volume (size)
2. Velocity (speed)
3. Variety (structured or not)
4. Veracity (quality)
What is a categorical value? - answer Nominal or Ordinal; can be coded either using
text or as integers, if coded using text, then we must convert them before starting our
analysis
Nominal Values - answerdata has no order, we must create dummy variables consistent
of 0 and 1's (male and female)
Ordinal Values - answerdata has an order, we just replace text with integers preserving
the ranking between categories
What is a predictor variable? - answerusually denoted by x, used as an input into a
predictive model, also called a feature, input variable, independent variable, or from a
database perspective, a field.
What are various terms used for observations? - answerinstance, sample, example,
case, record, pattern, or row.
What is a classification problem? - answerThe process of finding a model that describes
and distinguishes data classes and concepts. The problem requires identifying which
category an observation belongs to on the basis of a training set of data containing
observations whose categories are known. Ex. wanting to predict the probability of rain
tomorrow.
What is a prediction problem? - answerCaptures many common machine learning
problems, goal is to give the correct label to an instance; prediction is similar to
classification but it emphasizes predicting the value of a numerical variable (i.e. amount
of purchase) rather than purchaser or non purchaser.
Prediction: refers to the prediction of the value of a continuous variable (estimation &
regression are sometimes used to refer to prediction of the value of a variable)
, What is supervised learning? - answercontains an outcome variable; made up of
classification and prediction; labels are categorical and continuous
What is unsupervised learning? - answerwhen there is no outcome to predict or classify;
goal is to find associations and patterns; made up of association rules, dimension
reduction, clustering, collaborative filling (personalized Netflix recommendations) and
affiliation analysis
The performance of data mining algorithms can be improved by - answerlimiting
variables and by converting categorical variables into numerical values and by data
reduction.
When is oversampling rare events useful? - answerWhen classes are present in very
unequal or imbalanced proportions, simple random sampling may produce too few of
the rare class to yield useful information about what distinguishes them from the
dominant class.
Dimension Reduction - answerProcess of reducing the number of variables to consider
in a data-mining approach; improves predictive power, manageability, and
interpretability
Data Reduction - answerconsolidating a large number of samples into a smaller set; use
clustering to do this
What are the main types of categorical variables and how can we code them? -
answerCan be coded using text or numerically; categorical variables must be converted
for the computer to process them
Ordinal- can replace text with integers preserving the ranking between categories-
ordering categories (rankings, order, or scaling)
Nominal- create dummy variables (is female or is male, and female or male is converted
to 0 or 1) categories with no ordering or direction
1. If you have 20 predictors and 2 classes, then you'll need a minimum of how many
cases? - answer6 x m x p (6 x 2 x 20) = 240 cases
Identification of outliers is followed by - answernormalizing the data (scaling the data);
Common pre-processing steps:
1. Find Missing values
2. Look for Outliers
3. Normalizing the data
4. Converting categorical variables to numerical variables
One way of dealing with missing values is by - answerdata imputation; by replacing the
mean or median of the non-missing records for that variable (can use linear regression),
another way is omission.
What does business analytics mean? - answer the science of transforming data into
insights and models that lead to better decisions and add value to individuals,
companies and societies.
What is machine learning? - answer algorithms that can learn rules and associations
from the data; artificial intelligence
What are the Four V's of Big Data? - answer1. Volume (size)
2. Velocity (speed)
3. Variety (structured or not)
4. Veracity (quality)
What is a categorical value? - answer Nominal or Ordinal; can be coded either using
text or as integers, if coded using text, then we must convert them before starting our
analysis
Nominal Values - answerdata has no order, we must create dummy variables consistent
of 0 and 1's (male and female)
Ordinal Values - answerdata has an order, we just replace text with integers preserving
the ranking between categories
What is a predictor variable? - answerusually denoted by x, used as an input into a
predictive model, also called a feature, input variable, independent variable, or from a
database perspective, a field.
What are various terms used for observations? - answerinstance, sample, example,
case, record, pattern, or row.
What is a classification problem? - answerThe process of finding a model that describes
and distinguishes data classes and concepts. The problem requires identifying which
category an observation belongs to on the basis of a training set of data containing
observations whose categories are known. Ex. wanting to predict the probability of rain
tomorrow.
What is a prediction problem? - answerCaptures many common machine learning
problems, goal is to give the correct label to an instance; prediction is similar to
classification but it emphasizes predicting the value of a numerical variable (i.e. amount
of purchase) rather than purchaser or non purchaser.
Prediction: refers to the prediction of the value of a continuous variable (estimation &
regression are sometimes used to refer to prediction of the value of a variable)
, What is supervised learning? - answercontains an outcome variable; made up of
classification and prediction; labels are categorical and continuous
What is unsupervised learning? - answerwhen there is no outcome to predict or classify;
goal is to find associations and patterns; made up of association rules, dimension
reduction, clustering, collaborative filling (personalized Netflix recommendations) and
affiliation analysis
The performance of data mining algorithms can be improved by - answerlimiting
variables and by converting categorical variables into numerical values and by data
reduction.
When is oversampling rare events useful? - answerWhen classes are present in very
unequal or imbalanced proportions, simple random sampling may produce too few of
the rare class to yield useful information about what distinguishes them from the
dominant class.
Dimension Reduction - answerProcess of reducing the number of variables to consider
in a data-mining approach; improves predictive power, manageability, and
interpretability
Data Reduction - answerconsolidating a large number of samples into a smaller set; use
clustering to do this
What are the main types of categorical variables and how can we code them? -
answerCan be coded using text or numerically; categorical variables must be converted
for the computer to process them
Ordinal- can replace text with integers preserving the ranking between categories-
ordering categories (rankings, order, or scaling)
Nominal- create dummy variables (is female or is male, and female or male is converted
to 0 or 1) categories with no ordering or direction
1. If you have 20 predictors and 2 classes, then you'll need a minimum of how many
cases? - answer6 x m x p (6 x 2 x 20) = 240 cases
Identification of outliers is followed by - answernormalizing the data (scaling the data);
Common pre-processing steps:
1. Find Missing values
2. Look for Outliers
3. Normalizing the data
4. Converting categorical variables to numerical variables
One way of dealing with missing values is by - answerdata imputation; by replacing the
mean or median of the non-missing records for that variable (can use linear regression),
another way is omission.