SOLUTIONS GRADED A+
✔✔True or False: Neural networks are an unsupervised technique, because there is no
target variable. - ✔✔False
✔✔NLP stands for... - ✔✔natural language processing
✔✔Tokenization, as defined in the lecture is... - ✔✔A computer turning letters and/or
words into something it can read and understand, like numbers
✔✔Recommenders come in many flavors. 2 of the most common, often used together
and discussed in the lecture are: (choose the following)
- Item Based
- User Based
- Algorithm Oriented
- Stock Availability Based
- Syntax Dependent - ✔✔- Item Based
- User Based
✔✔Imagine you have a dataset with 2 columns, both filled with continuous numbers.
You believe the first column is a predictor of the second column. Which of the model
approaches below could work when building a model?
- Regression
- Decision Tree
- Running .describe and .info on the data
- Graphing
- Random Forest - ✔✔- Regression
- Decision Tree
- Random Forest
✔✔Decision trees have a few problems. The problem we talked about the most is: -
✔✔overfitting
✔✔In y = ax + b
A is commonly known as _____ and B is commonly known as _____. - ✔✔slope;
intercept
✔✔True or False: The LinearRegression estimator is only capable of simple straight line
fits - ✔✔False
✔✔In class we walked through 5 steps to building a machine learning model. The
textbook goes over in some depth the 5 steps. What are they? - ✔✔First Step: choosing
a class of model
, Second Step: choosing hyperparameters
Third Step: arrange data
Fourth Step: fit the model
Fifth Step: predict
✔✔What is the purpose of the below code?
Note that this is probably EASIER than similar questions on the final exam. But I will ask
you why/purpose/what for questions on the code I have had you run. It is useful to make
notes on your notebooks about why a certain chunk of code is run.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np - ✔✔import python packages
✔✔Choosing a class of models
Your data set consists of details about customer traits, such as "number of items in the
basket at checkout". Your task is to group customers that are like each other together.
You don't already have labeled customer types. What kind of model are you building? -
✔✔unsupervised model (such as K means)
- reminder: if you have a bunch of Xs, but no Ys the problem is unsupervised; When you
are building a supervised model, you have an X and a Y. The hints there is "you don't
already have labeled customer types " without these labels, the Y, you can't have any
supervision
✔✔What is ONE reason the textbook lists for why a Linear regression is a good starting
point in a modeling task - ✔✔they are interpretable
✔✔The correct number of clusters in Hierarchical clustering can be determined
precisely using approaches such as silhouette scores (True or False) - ✔✔False
✔✔In K Means clustering, the analyst does not need to determine the number of
clusters (K), these are always derived analytically using the kmeans algorithm. (True or
False) - ✔✔False
✔✔One big difference between the unsupervised approaches in this module, and the
supervised approaches in prior modules: Unsupervised models do not have a target
variable (Y). This make is difficult to know when they are "right" or correct. (True or
False) - ✔✔True
✔✔According to the documentation, a silhouette scores of 1 ia - ✔✔The best score