CS7641 Machine Learning Final Verified Exam Study
Set, 2026/2027 -Question Machine Learning
Examination with Answers and Rationales||Newest
Exam!!!
What are the parts of a classification problem? - Answer--
Instances, the input data
- The concept, the function the data represents
- The target concept, the answer we want
- The hypotheses -- all possible target concept functions
- Samples paired to the correct output
- A candidate, a potential target concept
- A testing set, to evaluate how close the candidate is to
the target concept
How does the ID3 decision tree algorithm work? - Answer-
- Find the best attribute, the one with the most information
gain
- Create a branch for each possible value of that attribute
- Lump the training data into its respective branch
- Repeat if there is still more than one example per leaf
,2|Page
How is information gain in decision trees measured? -
Answer-- Entropy reduction: the greatest decrease in
probability of seeing multiple different values
What are the two kinds of bias to worry about when
designing a classifier? - Answer-- Restriction bias: we only
consider candidate functions that can be represented by
our classifier
- Preference bias: we prefer certain kinds of hypotheses
over others
What is the preference bias of the ID3 decision tree
algorithm? - Answer-- Prefers trees with good splits at the
top (greedy)
- Prefers correct decision trees over incorrect ones
- Prefers shallower trees
How does a (boolean) decision tree handle continuous
values? - Answer-- Binning: ask if the value is greater than
or less than a particular value
What is decision tree pruning? - Answer-- Remove the
branches with the least effect on training error
,3|Page
Why is decision tree pruning used? - Answer-- To avoid
overfitting on training data
How do you adapt a decision tree to regression problems?
- Answer-- Information gain is no longer useful, so we can
split on the values where the categories have the least
variance or most correlation
What is classification? - Answer-- Mapping complex inputs
to a label
What is regression? - Answer-- Mapping complex inputs to
a numeric value, usually continuous
What is overfitting? - Answer-`- Model is too closely fit to
the training data and no longer works on real-world data
What is cross-validation? - Answer-- Split the training data
into "fake" train and test sets
When is ensemble learning useful? - Answer-- When there
are features that are inconclusive on their own, but
combining them is much more conclusive
, 4|Page
What's the general approach to ensemble learning
algorithms? - Answer-- Split the training data into smaller
subsets, learn their rules, then combine them into a
collective decision-maker
In ensemble learning, what is bagging? - Answer--
Bootstrap aggregation: choose data randomly with
replacement to form the subset, and take the average of
the result from each learner
In ensemble learning, what is boosting? - Answer-- Weight
all training examples equally to begin with
- After each 'round', find the learner with the lowest error,
and raise the weights of the examples it misclassified
- Combine the learners together with weights proportional
to their accuracy
What is the definition of a weak learner? - Answer-- Any
learner that performs even slightly better than random
chance on any distribution of data