BADM 211 Test 2
In a dataset, rows and columns correspond to ________ and _________ respectively. -
answerrows = records (y axis of data points/observations)
columns = variables (x axis of variables/categories)
A numerical variable can be defined as ___________. - answerwhere the measurement
or number has a numerical meaning
Two types of categorical variables are ____________ and ____________. -
answernominal (not ordered)
ordinal (ordered/ranking)
We need to dummy code _____________ . - answer nominal data
Two ways of rescaling the data are ____________ . - answerstandardizing (subtract
mean from each column and divide by standard deviation) and normalizing (rescale to
0-1 by subtracting min and dividing by max-min)
The main reason for doing data partitioning is _____________ . - answercreate clusters
Scatter plots represent ___________ . - answerhow much one variable is affected by
another (correlation)
We use _________ to visualize the entire distribution of a variable. - answerboxplots
and histograms
What is linear regression? - answerLinear regression is a statistical technique where the
score of a variable Y is predicted from the score of a second variable X. X is referred to
as the predictor variable and Y as the criterion variable.
What is predictive modeling and explanatory modeling? What are the differences? -
answerPredictive modeling: data mining technique used to predict future behavior and
anticipate change
Explanatory modeling: model that fits well with existing data, on which it was trained.
Focus on coefficients
Explanatory is evaluating how well the predictive model explains an outcome
Beta coefficients represent ________________. - answerthe degree of change in the
outcome variable for every 1-unit of change in the predictor variable.
,The higher the error, the ____________ model - answerless accurate
In explanatory modeling goodness of fit can be assessed using ________. -
answerstandard error (r^2)
Mean error gives us information ____________ . - answeron indicating forecast bias
(over/underpredicting)
Four assumptions in linear regression are ________________________. - answer1.
linearity: relationship between X and mean of Y is linear
2. homoscedasticity: variance of residual is the same for any value of X
3. independence: observations are independent of each other
4. normality: for any fixed value of X, Y is normally distributed
Two measures that can be used to assess predictive performance are ___________ . -
answerMAE and RMSE
We estimate the beta coefficients of a MLR model using ________ data. -
answertrained
A key shortcoming of using Mean Error to gauge the predictive accuracy of a model
is___________. - answerit is hard to compare to other data sets
Multicollinearity occurs when ___________. - answerindependent/predictor variables
are highly correlated with each other
The most commonly used alias for the matplotlib is ____________ . - answerplt
I can create a figure with n rows and m columns using the following command ______ .
- answerplt.subplot(n, m)
(T/F) K means is a hierarchical clustering algorithm. - answerF (hierarchical is partional)
We use elbow plot to _________ . - answerpick a k value
Two main types of clustering is ____ and _________ . - answerpartitional and
hierarchical
Which of the following is not a proper variable type in python?
Exponential
Floating Point
, String
Integer - answerExponential
x="This class is about programming"
Above we define a/an ______________ variable. - answerString
Which of the following properly defines a list?
my_list = {1,2,3,4}
my_list = [1,2,3,4]
my_list == {1,2,3,4}
my_list == [1,2,3,4] - answermy_list = [1,2,3,4]
Python is a/an ____________ programming language. - answerobject oriented
In python list indices start with integer ____ . - answer0
What is the output of the following code snippet?
my_list=[4,0,2,1]
print(my_list[3]) - answer1
What is the output of the following code snippet?
my_list=[5,1,4,2]
print(my_list[-3]) - answer1
What is the output of the following code snippet?
str1="Class"
str2="BADM211"
print(str1+str2) - answer"ClassBADM211"
Which of the following is TRUE?
a. A string is nothing but an integer of characters.
b. A string is nothing but a dictionary of characters.
c. A string is nothing but a float of characters.
d. A string is nothing but a list of characters. - answerd. A string is nothing but a list of
characters.
Which of the following best describes overfitting?
In a dataset, rows and columns correspond to ________ and _________ respectively. -
answerrows = records (y axis of data points/observations)
columns = variables (x axis of variables/categories)
A numerical variable can be defined as ___________. - answerwhere the measurement
or number has a numerical meaning
Two types of categorical variables are ____________ and ____________. -
answernominal (not ordered)
ordinal (ordered/ranking)
We need to dummy code _____________ . - answer nominal data
Two ways of rescaling the data are ____________ . - answerstandardizing (subtract
mean from each column and divide by standard deviation) and normalizing (rescale to
0-1 by subtracting min and dividing by max-min)
The main reason for doing data partitioning is _____________ . - answercreate clusters
Scatter plots represent ___________ . - answerhow much one variable is affected by
another (correlation)
We use _________ to visualize the entire distribution of a variable. - answerboxplots
and histograms
What is linear regression? - answerLinear regression is a statistical technique where the
score of a variable Y is predicted from the score of a second variable X. X is referred to
as the predictor variable and Y as the criterion variable.
What is predictive modeling and explanatory modeling? What are the differences? -
answerPredictive modeling: data mining technique used to predict future behavior and
anticipate change
Explanatory modeling: model that fits well with existing data, on which it was trained.
Focus on coefficients
Explanatory is evaluating how well the predictive model explains an outcome
Beta coefficients represent ________________. - answerthe degree of change in the
outcome variable for every 1-unit of change in the predictor variable.
,The higher the error, the ____________ model - answerless accurate
In explanatory modeling goodness of fit can be assessed using ________. -
answerstandard error (r^2)
Mean error gives us information ____________ . - answeron indicating forecast bias
(over/underpredicting)
Four assumptions in linear regression are ________________________. - answer1.
linearity: relationship between X and mean of Y is linear
2. homoscedasticity: variance of residual is the same for any value of X
3. independence: observations are independent of each other
4. normality: for any fixed value of X, Y is normally distributed
Two measures that can be used to assess predictive performance are ___________ . -
answerMAE and RMSE
We estimate the beta coefficients of a MLR model using ________ data. -
answertrained
A key shortcoming of using Mean Error to gauge the predictive accuracy of a model
is___________. - answerit is hard to compare to other data sets
Multicollinearity occurs when ___________. - answerindependent/predictor variables
are highly correlated with each other
The most commonly used alias for the matplotlib is ____________ . - answerplt
I can create a figure with n rows and m columns using the following command ______ .
- answerplt.subplot(n, m)
(T/F) K means is a hierarchical clustering algorithm. - answerF (hierarchical is partional)
We use elbow plot to _________ . - answerpick a k value
Two main types of clustering is ____ and _________ . - answerpartitional and
hierarchical
Which of the following is not a proper variable type in python?
Exponential
Floating Point
, String
Integer - answerExponential
x="This class is about programming"
Above we define a/an ______________ variable. - answerString
Which of the following properly defines a list?
my_list = {1,2,3,4}
my_list = [1,2,3,4]
my_list == {1,2,3,4}
my_list == [1,2,3,4] - answermy_list = [1,2,3,4]
Python is a/an ____________ programming language. - answerobject oriented
In python list indices start with integer ____ . - answer0
What is the output of the following code snippet?
my_list=[4,0,2,1]
print(my_list[3]) - answer1
What is the output of the following code snippet?
my_list=[5,1,4,2]
print(my_list[-3]) - answer1
What is the output of the following code snippet?
str1="Class"
str2="BADM211"
print(str1+str2) - answer"ClassBADM211"
Which of the following is TRUE?
a. A string is nothing but an integer of characters.
b. A string is nothing but a dictionary of characters.
c. A string is nothing but a float of characters.
d. A string is nothing but a list of characters. - answerd. A string is nothing but a list of
characters.
Which of the following best describes overfitting?