CS 7641 FINAL - SUPERVISED LEARNING QUESTIONS
WITH VERIFIED ACCURATE ANSWERS
What is classification? - Answers - - Mapping complex inputs to a label
What is regression? - Answers - - Mapping complex inputs to a numeric value, usually
continuous
What is overfitting? - Answers - `- Model is too closely fit to the training data and no
longer works on real-world data
What is cross-validation? - Answers - - Split the training data into "fake" train and test
sets
What are the parts of a classification problem? - Answers - - Instances, the input data
- The concept, the function the data represents
- The target concept, the answer we want
- The hypotheses -- all possible target concept functions
- Samples paired to the correct output
- A candidate, a potential target concept
- A testing set, to evaluate how close the candidate is to the target concept
How does the ID3 decision tree algorithm work? - Answers - - Find the best attribute,
the one with the most information gain
- Create a branch for each possible value of that attribute
- Lump the training data into its respective branch
- Repeat if there is still more than one example per leaf
How is information gain in decision trees measured? - Answers - - Entropy reduction:
the greatest decrease in probability of seeing multiple different values
What are the two kinds of bias to worry about when designing a classifier? - Answers - -
Restriction bias: we only consider candidate functions that can be represented by our
classifier
- Preference bias: we prefer certain kinds of hypotheses over others
What is the preference bias of the ID3 decision tree algorithm? - Answers - - Prefers
trees with good splits at the top (greedy)
- Prefers correct decision trees over incorrect ones
- Prefers shallower trees
, How does a (boolean) decision tree handle continuous values? - Answers - - Binning:
ask if the value is greater than or less than a particular value
What is decision tree pruning? - Answers - - Remove the branches with the least effect
on training error
Why is decision tree pruning used? - Answers - - To avoid overfitting on training data
How do you adapt a decision tree to regression problems? - Answers - - Information
gain is no longer useful, so we can split on the values where the categories have the
least variance or most correlation
When is ensemble learning useful? - Answers - - When there are features that are
inconclusive on their own, but combining them is much more conclusive
What's the general approach to ensemble learning algorithms? - Answers - - Split the
training data into smaller subsets, learn their rules, then combine them into a collective
decision-maker
In ensemble learning, what is bagging? - Answers - - Bootstrap aggregation: choose
data randomly with replacement to form the subset, and take the average of the result
from each learner
In ensemble learning, what is boosting? - Answers - - Weight all training examples
equally to begin with
- After each 'round', find the learner with the lowest error, and raise the weights of the
examples it misclassified
- Combine the learners together with weights proportional to their accuracy
What is the definition of a weak learner? - Answers - - Any learner that performs even
slightly better than random chance on any distribution of data
What is the high-level AdaBoost algorithm? - Answers - - Start with a uniform
distribution to select weak learner subsets
- Weight the probability in the next distribution inversely to how correctly it was classified
- This causes the classifier to weight incorrect results more as performance increases
How does boosting perform with respect to overfitting? - Answers - - In general, training
performance is similar to test performance, because it prioritizes having high confidence
in its classifications
- Can still overfit if all the underlying learners do, or if there's uniform noise in the data
What is the motivation/purpose of a support vector machine? - Answers - - Find the
boundary line that separates data into categories
WITH VERIFIED ACCURATE ANSWERS
What is classification? - Answers - - Mapping complex inputs to a label
What is regression? - Answers - - Mapping complex inputs to a numeric value, usually
continuous
What is overfitting? - Answers - `- Model is too closely fit to the training data and no
longer works on real-world data
What is cross-validation? - Answers - - Split the training data into "fake" train and test
sets
What are the parts of a classification problem? - Answers - - Instances, the input data
- The concept, the function the data represents
- The target concept, the answer we want
- The hypotheses -- all possible target concept functions
- Samples paired to the correct output
- A candidate, a potential target concept
- A testing set, to evaluate how close the candidate is to the target concept
How does the ID3 decision tree algorithm work? - Answers - - Find the best attribute,
the one with the most information gain
- Create a branch for each possible value of that attribute
- Lump the training data into its respective branch
- Repeat if there is still more than one example per leaf
How is information gain in decision trees measured? - Answers - - Entropy reduction:
the greatest decrease in probability of seeing multiple different values
What are the two kinds of bias to worry about when designing a classifier? - Answers - -
Restriction bias: we only consider candidate functions that can be represented by our
classifier
- Preference bias: we prefer certain kinds of hypotheses over others
What is the preference bias of the ID3 decision tree algorithm? - Answers - - Prefers
trees with good splits at the top (greedy)
- Prefers correct decision trees over incorrect ones
- Prefers shallower trees
, How does a (boolean) decision tree handle continuous values? - Answers - - Binning:
ask if the value is greater than or less than a particular value
What is decision tree pruning? - Answers - - Remove the branches with the least effect
on training error
Why is decision tree pruning used? - Answers - - To avoid overfitting on training data
How do you adapt a decision tree to regression problems? - Answers - - Information
gain is no longer useful, so we can split on the values where the categories have the
least variance or most correlation
When is ensemble learning useful? - Answers - - When there are features that are
inconclusive on their own, but combining them is much more conclusive
What's the general approach to ensemble learning algorithms? - Answers - - Split the
training data into smaller subsets, learn their rules, then combine them into a collective
decision-maker
In ensemble learning, what is bagging? - Answers - - Bootstrap aggregation: choose
data randomly with replacement to form the subset, and take the average of the result
from each learner
In ensemble learning, what is boosting? - Answers - - Weight all training examples
equally to begin with
- After each 'round', find the learner with the lowest error, and raise the weights of the
examples it misclassified
- Combine the learners together with weights proportional to their accuracy
What is the definition of a weak learner? - Answers - - Any learner that performs even
slightly better than random chance on any distribution of data
What is the high-level AdaBoost algorithm? - Answers - - Start with a uniform
distribution to select weak learner subsets
- Weight the probability in the next distribution inversely to how correctly it was classified
- This causes the classifier to weight incorrect results more as performance increases
How does boosting perform with respect to overfitting? - Answers - - In general, training
performance is similar to test performance, because it prioritizes having high confidence
in its classifications
- Can still overfit if all the underlying learners do, or if there's uniform noise in the data
What is the motivation/purpose of a support vector machine? - Answers - - Find the
boundary line that separates data into categories