MSCI 446 Midterm Practice QUESTIONS AND Answers
-Explain the past - at face value,
What is unsupervised machine what does it tell you? o Not
learning? interested in learning the future
-Segmentation, association
-Predict the future based on the past given data
What is supervised machine
learning? -Predict the value of some variable (past) using the values of other
variables
, Input: examples of legitimate emails and spam emails
Output: a program that predicts whether a given email is
How does email spam filtering legitimate or spam - could be very bad depending on how
work? general the learning is (i.e. keywords vs. exact
sentences)
Categorical -> Boolean/Binary: If
the categorical variable has k
Explain One Hot Encoding values, one hot encoding creates k
Boolean variables, one of which
will be true in any one record
oA fragment of vector from a short
document with text
o Unlocks machine learning
algorithms that previously
required numeric values
Explain Document Term Vectors
o Consequence: Fails to recognize
words that have similar semantics
because they have different
spellings
o Bert model: word embeddings
· The variable we want to predict
· If the outcome variable is categorical, we say we are doing
What is an outcome? classification
· Otherwise, we say we are doing numeric prediction
· Classifier: The algorithm for classification
· The variables we will use to predict the outcome variable
What is a predictor?
· Also referred to as explanatory variable or features
What is a good feature? A good feature should be correlated with the label it predicts
Data Quality
-Explain the past - at face value,
What is unsupervised machine what does it tell you? o Not
learning? interested in learning the future
-Segmentation, association
-Predict the future based on the past given data
What is supervised machine
learning? -Predict the value of some variable (past) using the values of other
variables
, Input: examples of legitimate emails and spam emails
Output: a program that predicts whether a given email is
How does email spam filtering legitimate or spam - could be very bad depending on how
work? general the learning is (i.e. keywords vs. exact
sentences)
Categorical -> Boolean/Binary: If
the categorical variable has k
Explain One Hot Encoding values, one hot encoding creates k
Boolean variables, one of which
will be true in any one record
oA fragment of vector from a short
document with text
o Unlocks machine learning
algorithms that previously
required numeric values
Explain Document Term Vectors
o Consequence: Fails to recognize
words that have similar semantics
because they have different
spellings
o Bert model: word embeddings
· The variable we want to predict
· If the outcome variable is categorical, we say we are doing
What is an outcome? classification
· Otherwise, we say we are doing numeric prediction
· Classifier: The algorithm for classification
· The variables we will use to predict the outcome variable
What is a predictor?
· Also referred to as explanatory variable or features
What is a good feature? A good feature should be correlated with the label it predicts
Data Quality