QMB3302 FINAL EXAM
QUESTIONS WITH ACCURATE
SOLUTIONS
1) Pipelines are useful (in analytics with Python sense) for the following
reasons? (choose all that apply)
- Pipelines make it easy to repeat/replicate steps and run
multiple models
- Pipelines are good for moving data into your programming
environment
- Pipelines automatically update to new versions of Python
- Pipelines help organize code you used to clean and treat your
data
- Pipelines make it very easy to change small things in your
model, like which variable to include -- Correct Answer ✔✔ -
Pipelines make it easy to repeat/replicate steps and run
multiple models
- Pipelines help organize code you used to clean and treat your
data
- Pipelines make it very easy to change small things in your
model, like which variable to include
2) The basic idea of a regression is very simple. We have some X value
(which we call ______) and some Y value that we are trying to _____.
We could have multiple Y value, but that is not something we have
covered. -- Correct Answer ✔✔ features; predict
, 3) Y and y-hat are a little different. Y is our target vector, and y-hat is
an output in our model that is a.... (choose one of the following)
- estimate or prediction of y
- the actual value of y
- an axis on our 2 way graph
- a combination of XY intercept coordinates -- Correct Answer
✔✔ estimate or prediction of y
4) When looking at the code in the videos, we sometimes used a
variable to hold out model. What is the significance of the word
"model" in the below code?
model = LinearRegression(fit_intercept=True) -- Correct Answer
✔✔ 'model' is a named variable and is just holding our linear
regression model. It could be renamed anything. The word itself
is not important. It is just a container.
5) What is a good model fit value? -- Correct Answer ✔✔ unknowable
without knowing/understanding the context of the domain
6) Imagine X in the below is a missing value. If I were to run a median
imputer on this set of data, what would the return value be?
50, 60, 70, 80, 100, 60, 5000, X -- Correct Answer ✔✔ 70
7) Which of the below were discussed as being problems with the
holdout method for validation?
- Data is not available for test and control differences
QUESTIONS WITH ACCURATE
SOLUTIONS
1) Pipelines are useful (in analytics with Python sense) for the following
reasons? (choose all that apply)
- Pipelines make it easy to repeat/replicate steps and run
multiple models
- Pipelines are good for moving data into your programming
environment
- Pipelines automatically update to new versions of Python
- Pipelines help organize code you used to clean and treat your
data
- Pipelines make it very easy to change small things in your
model, like which variable to include -- Correct Answer ✔✔ -
Pipelines make it easy to repeat/replicate steps and run
multiple models
- Pipelines help organize code you used to clean and treat your
data
- Pipelines make it very easy to change small things in your
model, like which variable to include
2) The basic idea of a regression is very simple. We have some X value
(which we call ______) and some Y value that we are trying to _____.
We could have multiple Y value, but that is not something we have
covered. -- Correct Answer ✔✔ features; predict
, 3) Y and y-hat are a little different. Y is our target vector, and y-hat is
an output in our model that is a.... (choose one of the following)
- estimate or prediction of y
- the actual value of y
- an axis on our 2 way graph
- a combination of XY intercept coordinates -- Correct Answer
✔✔ estimate or prediction of y
4) When looking at the code in the videos, we sometimes used a
variable to hold out model. What is the significance of the word
"model" in the below code?
model = LinearRegression(fit_intercept=True) -- Correct Answer
✔✔ 'model' is a named variable and is just holding our linear
regression model. It could be renamed anything. The word itself
is not important. It is just a container.
5) What is a good model fit value? -- Correct Answer ✔✔ unknowable
without knowing/understanding the context of the domain
6) Imagine X in the below is a missing value. If I were to run a median
imputer on this set of data, what would the return value be?
50, 60, 70, 80, 100, 60, 5000, X -- Correct Answer ✔✔ 70
7) Which of the below were discussed as being problems with the
holdout method for validation?
- Data is not available for test and control differences