GRADED A+
✔✔In class we walked through 5 steps to building a machine learning model. The
textbook also goes over in some depth the 5 steps. What is step 1? - ✔✔Choosing a
class of model
✔✔In class we walked through 5 steps to building a machine learning model. The
textbook also goes over in some depth the 5 steps. What is step 2? - ✔✔Choose
hyperparameters
✔✔In class we walked through 5 steps to building a machine learning model. The
textbook also goes over in some depth the 5 steps. What is step 3? - ✔✔Aarrange data
✔✔In class we walked through 5 steps to building a machine learning model. The
textbook also goes over in some depth the 5 steps. What is step 4? - ✔✔Fit the model
✔✔In class we walked through 5 steps to building a machine learning model. The
textbook also goes over in some depth the 5 steps. What is step 5? - ✔✔Predict
✔✔What is the purpose of the below code?
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np - ✔✔Import python packages
✔✔Your dataset consists of details about customer traits, such as "number of items in
the basket at checkout" and "time of day of checkout". Your task is to group customers
that are like each other together. You don't already have labeled customer types. What
kind of model are you building? - ✔✔Unsupervised model (like k means)
✔✔What is ONE reason the textbook lists for why a Linear regression is a good starting
point in a modeling task. - ✔✔They are interpretable
✔✔What is the first variable in a decision tree called (before any of the branches)? -
✔✔Root
✔✔One problem with decision trees is that they are prone to - ✔✔Over fitting
✔✔If you are not careful or do not see the __________________ appropriately, leads to
decision trees overfitting - ✔✔Max depth
, ✔✔The random forest algorithm prevents, or at least avoids to some extent, the
problems with overfitting found in decision trees. (True or False) - ✔✔True
✔✔Random forests can only be used on classification problems (true or false) -
✔✔False
✔✔In order to interpret decision trees its necessary to first run a linear regression (true
or false) - ✔✔False
✔✔Decision tree's are nice because they are fairly simple and straightforward to
interpret (True of False) - ✔✔True
✔✔When running our first decision tree, we took out "maxdepth=". This had the
unfortunate result of... - ✔✔Building a very large hard to understand tree
✔✔What is the terminal node as discussed in the lecture? - ✔✔The last node
(sometimes called a leaf), the tree doesnt split after this
✔✔Models, such as the random forest model we ran, often have a number of
parameters that the analyst can choose or set. What is a the best source of up to date
information about the different parameters that can be set? - ✔✔The scikit learn
documentation
✔✔Random forests are __________ interpretable than decision trees - ✔✔Less
✔✔Pipelines are useful (in the analytics with Python sense) for what reasons? -
✔✔Make it easy to repeat/replicate steps and run multiple models, help organize the
code you used to clean and treat data, and make it eassy to change small things in
model like which variables to include.
✔✔Y and y-hat are a little different. Y is our target vector, and y-hat is an output in our
model that is a..... - ✔✔Estimate or prediction of y
✔✔The basic idea of a regression is very simple. We have some X values (we called
these ___________ and some Y value (this is the variable we are trying to _________ .
We could have multiple Y values, but that is not something we have covered. -
✔✔Features; Predict
✔✔When looking at the code in the videos, we sometimes used a variable to hold our
model.
What is the significance of the word "model" in the below code?