QMB3302 FINAL EXAM, QMB3302 FINAL
UF EXAM QUESTIONS AND ANSWERS
GRADED A+ 2026
Pipelines are useful (in analytics with Python sense) for the following reasons? (choose all that
apply)
- Pipelines make it easy to repeat/replicate steps and run multiple models
- Pipelines are good for moving data into your programming environment
- Pipelines automatically update to new versions of Python
- Pipelines help organize code you used to clean and treat your data
- Pipelines make it very easy to change small things in your model, like which variable to include
- ANS - Pipelines make it easy to repeat/replicate steps and run multiple models
- Pipelines help organize code you used to clean and treat your data
- Pipelines make it very easy to change small things in your model, like which variable to include
The basic idea of a regression is very simple. We have some X value (which we call ______) and
some Y value that we are trying to _____. We could have multiple Y value, but that is not
something we have covered. - ANS features; predict
Y and y-hat are a little different. Y is our target vector, and y-hat is an output in our model that is
a.... (choose one of the following)
- estimate or prediction of y
- the actual value of y
@COPYRIGHT 2026/2027 ALL RIGHTS RESERVED
1
,- an axis on our 2 way graph
- a combination of XY intercept coordinates - ANS estimate or prediction of y
When looking at the code in the videos, we sometimes used a variable to hold out model. What
is the significance of the word "model" in the below code?
model = LinearRegression(fit_intercept=True) - ANS 'model' is a named variable and is just
holding our linear regression model. It could be renamed anything. The word itself is not
important. It is just a container.
What is a good model fit value? - ANS unknowable without knowing/understanding the
context of the domain
Imagine X in the below is a missing value. If I were to run a median imputer on this set of data,
what would the return value be?
50, 60, 70, 80, 100, 60, 5000, X - ANS 70
Which of the below were discussed as being problems with the holdout method for validation?
- Data is not available for test and control differences
- Outliers can skew the result
- The model is not trained on all of the data
- K=3 is not sufficiently large enough
- Validation is sometimes too challenging - ANS - Outliers can skew the result
- The model is not trained on all of the data
The features in a model...
- are used as proxies for y-hat divided by y
- are always functions of each other
@COPYRIGHT 2026/2027 ALL RIGHTS RESERVED
2
, - keep the model validation process stable
- none of these answers are correct - ANS none of these answers are correct
What is the first variable in a decision tree called (before any of the branches)? - ANS root
One problem with decision trees is that they are prone to _____ if you are not careful or do not
set the _____ appropriately. - ANS overfitting; max depth
True or False: The random forest algorithm prevents, or at least avoids to some extent, the
problems with overfitting found in decision trees. - ANS True
True or False: Random Forests can only be used on classification problems - ANS False
True or False: In order to interpret Decision Tree's, it is necessary to first run a linear regression -
ANS False
True or False: Decision Tree's are nice because they are fairly simple and straightforward to
interpret - ANS True
When running our first decision tree, we took out "maxdepth=". This had the unfortunate result
of... - ANS Building a very large hard to understand tree
What is the terminal node as discussed in the lecture? - ANS The last node (sometimes called
a leaf is you google the term); the tree doesn't split after this
Models, such as the random forest model we ran, often have a number of parameters that the
analyst can choose or set.
What is a the best source of up to date information about the different parameters that can be
set? - ANS The scikit learn documentation
@COPYRIGHT 2026/2027 ALL RIGHTS RESERVED
3
UF EXAM QUESTIONS AND ANSWERS
GRADED A+ 2026
Pipelines are useful (in analytics with Python sense) for the following reasons? (choose all that
apply)
- Pipelines make it easy to repeat/replicate steps and run multiple models
- Pipelines are good for moving data into your programming environment
- Pipelines automatically update to new versions of Python
- Pipelines help organize code you used to clean and treat your data
- Pipelines make it very easy to change small things in your model, like which variable to include
- ANS - Pipelines make it easy to repeat/replicate steps and run multiple models
- Pipelines help organize code you used to clean and treat your data
- Pipelines make it very easy to change small things in your model, like which variable to include
The basic idea of a regression is very simple. We have some X value (which we call ______) and
some Y value that we are trying to _____. We could have multiple Y value, but that is not
something we have covered. - ANS features; predict
Y and y-hat are a little different. Y is our target vector, and y-hat is an output in our model that is
a.... (choose one of the following)
- estimate or prediction of y
- the actual value of y
@COPYRIGHT 2026/2027 ALL RIGHTS RESERVED
1
,- an axis on our 2 way graph
- a combination of XY intercept coordinates - ANS estimate or prediction of y
When looking at the code in the videos, we sometimes used a variable to hold out model. What
is the significance of the word "model" in the below code?
model = LinearRegression(fit_intercept=True) - ANS 'model' is a named variable and is just
holding our linear regression model. It could be renamed anything. The word itself is not
important. It is just a container.
What is a good model fit value? - ANS unknowable without knowing/understanding the
context of the domain
Imagine X in the below is a missing value. If I were to run a median imputer on this set of data,
what would the return value be?
50, 60, 70, 80, 100, 60, 5000, X - ANS 70
Which of the below were discussed as being problems with the holdout method for validation?
- Data is not available for test and control differences
- Outliers can skew the result
- The model is not trained on all of the data
- K=3 is not sufficiently large enough
- Validation is sometimes too challenging - ANS - Outliers can skew the result
- The model is not trained on all of the data
The features in a model...
- are used as proxies for y-hat divided by y
- are always functions of each other
@COPYRIGHT 2026/2027 ALL RIGHTS RESERVED
2
, - keep the model validation process stable
- none of these answers are correct - ANS none of these answers are correct
What is the first variable in a decision tree called (before any of the branches)? - ANS root
One problem with decision trees is that they are prone to _____ if you are not careful or do not
set the _____ appropriately. - ANS overfitting; max depth
True or False: The random forest algorithm prevents, or at least avoids to some extent, the
problems with overfitting found in decision trees. - ANS True
True or False: Random Forests can only be used on classification problems - ANS False
True or False: In order to interpret Decision Tree's, it is necessary to first run a linear regression -
ANS False
True or False: Decision Tree's are nice because they are fairly simple and straightforward to
interpret - ANS True
When running our first decision tree, we took out "maxdepth=". This had the unfortunate result
of... - ANS Building a very large hard to understand tree
What is the terminal node as discussed in the lecture? - ANS The last node (sometimes called
a leaf is you google the term); the tree doesn't split after this
Models, such as the random forest model we ran, often have a number of parameters that the
analyst can choose or set.
What is a the best source of up to date information about the different parameters that can be
set? - ANS The scikit learn documentation
@COPYRIGHT 2026/2027 ALL RIGHTS RESERVED
3