SOLUTIONS GRADED A+
✔✔The features in a model...
- are used as proxies for y-hat divided by y
- are always functions of each other
- keep the model validation process stable
- none of these answers are correct - ✔✔none of these answers are correct
✔✔What is the first variable in a decision tree called (before any of the branches)? -
✔✔root
✔✔One problem with decision trees is that they are prone to _____ if you are not
careful or do not set the _____ appropriately. - ✔✔overfitting; max depth
✔✔True or False: The random forest algorithm prevents, or at least avoids to some
extent, the problems with overfitting found in decision trees. - ✔✔True
✔✔True or False: Random Forests can only be used on classification problems -
✔✔False
✔✔True or False: In order to interpret Decision Tree's, it is necessary to first run a linear
regression - ✔✔False
✔✔True or False: Decision Tree's are nice because they are fairly simple and
straightforward to interpret - ✔✔True
✔✔When running our first decision tree, we took out "maxdepth=". This had the
unfortunate result of... - ✔✔Building a very large hard to understand tree
✔✔What is the terminal node as discussed in the lecture? - ✔✔The last node
(sometimes called a leaf is you google the term); the tree doesn't split after this
✔✔Models, such as the random forest model we ran, often have a number of
parameters that the analyst can choose or set.
What is a the best source of up to date information about the different parameters that
can be set? - ✔✔The scikit learn documentation
✔✔Random forests are _____ interpretable than decision trees. - ✔✔less
✔✔True or False: The correct number of clusters in Hierarchical clustering can be
determined precisely using approaches such as silhouette scores. - ✔✔False
,✔✔True or False: In K Means clustering, the analyst does not need to determine the
number of clusters (K), these are always derived analytically using the kmeans
algorithm. - ✔✔False
✔✔True or False: One big difference between the unsupervised approaches in this
module, and the supervised approaches in prior modules: Unsupervised models do not
have a target variable (Y). This make is difficult to know when they are "right" or correct
- ✔✔True
✔✔According to the documentation, a silhouette scores of 1 is _____, and -1 is _____. -
✔✔the best score; the worst score
✔✔Select all that apply. Imagine you have a data set with columns/inputs for
customers:
Column 1 = Customer ID (a number)
Column 2 = Sales (a dollar value)
Column 3= Frequency (a number)
Column 4 = Satisfaction (a number)
You would like to understand the impact of Frequency on customer Satisfaction. What
types of approaches could you use?
Note that the type of data is brackets () after the column name. Choose the best
answer(s) from the available choices below.
- Decision Tree
- K Means
- Random Forest
- Linear Regression
- Hierarchical Clustering - ✔✔- Decision Tree
- Random Forest
- Linear Regression
✔✔Select all that apply. Imagine you have a data set with columns/inputs for
customers:
Column 1 = Customer ID
Column 2 = Distance to Stores
Column 3= Year spend
Column 4 = Likelihood to return
What kind of approache(s) could you use to understand more about these customers?
- Regression - to understand the effect of one or more variables on the others
, - Clustering - to develop groups of customers that have similar patterns - ✔✔-
Regression - to understand the effect of one or more variables on the others
- Clustering - to develop groups of customers that have similar patterns
✔✔What is the purpose of the following code:
from sklearn.preprocessing import StandardScaler
scale = SlandardScaler()
rfm_std = scale.fit_transform(df) - ✔✔to standardize the data
✔✔True or False: The elbow method provides an exact number of clusters for a kmean
algorithm - ✔✔False
✔✔True or False: Hierarchical clustering is more powerful than Kmeans, as it allows the
researcher to determine the exact number of clusters to use in the analysis - ✔✔False
✔✔In kmeans - the algorithm has multiple iterations. If we have a simple 2d problem,
and a k = 2, it begins by assigning the first centroids to _____, and then _____ of each
point or record to the centroid. - ✔✔a random initial starting point; measuring the
distance
✔✔An example this week was done in Jupiter like environment called Google Collab.
What was the language that was demonstrated in the videos? - ✔✔TensorFlow
✔✔True or False: Neural Networks in computing are exactly the same as neural
networks from biology. - ✔✔False
✔✔When viewing a diagram of a neural network there are several layers. Match the
layer to the description below:
Input Layer
Options:
- these are the X's, or inputs from your data
- these are the Y (the target variable you are interested in)
- something you don't see, here there is some computation to transform the X's to the Y
- the layer that translates the axions - ✔✔these are the X's, or inputs from your data
✔✔When viewing a diagram of a neural network there are several layers. Match the
layer to the description below:
Output Later
Options: