Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

CS7641 Machine Learning Final Exam Study Set, 2026/2027 – 75-Question Machine Learning Examination with Answers and Rationales

Beoordeling
-
Verkocht
-
Pagina's
25
Cijfer
A+
Geüpload op
27-04-2026
Geschreven in
2025/2026

This document covers the CS7641 Machine Learning Final Examination for the 2026/2027 academic cycle. It includes 75 questions with answers and rationales, focusing on core machine learning concepts and analytical problem-solving. The material supports exam preparation by reinforcing supervised and unsupervised learning, model evaluation, optimization techniques, probability, and algorithm performance analysis.

Meer zien Lees minder
Instelling
CS7641 Machine Learning
Vak
CS7641 Machine Learning

Voorbeeld van de inhoud

CS7641 Machine Learning — Final Exam Study Set 2026/2027

CS7641 MACHINE LEARNING FINAL EXAM STUDY SET
2026/2027 | 75 Questions | Exam Prep

Select the best answer for each question. Multiple-select items require selecting all correct responses.
Answers and rationales follow each question.




SECTION I: SUPERVISED LEARNING — DECISION TREES & NEURAL NETWORKS

1. Which of the following best describes the information gain used in decision tree learning?
A. The reduction in entropy achieved by splitting data on a given attribute
B. The increase in classification accuracy after pruning the tree
C. The total number of leaf nodes created during tree construction
D. The Gini impurity measured at the root node before any splits
Correct Answer: A. The reduction in entropy achieved by splitting data on a given attribute
Rationale: Information gain measures how much the entropy of the dataset decreases after partitioning
it on a specific attribute. It is defined as the difference between the parent node's entropy and the weighted
average entropy of the child nodes: IG(S, A) = Entropy(S) - Sum(|Sv|/|S| * Entropy(Sv)). Decision tree
algorithms such as ID3 select the attribute that maximizes information gain at each split. Option B
describes the effect of pruning but is not information gain itself. Option C refers to tree complexity. Option
D describes the Gini impurity at the root, which is a single metric rather than a gain measure.



2. A decision tree is trained on a dataset and achieves 99% training accuracy but only 68% test
accuracy. Which of the following is the MOST effective strategy to address this problem?
A. Increase the maximum depth of the tree to capture more patterns
B. Apply post-pruning by removing branches that provide minimal improvement on validation data
C. Use a larger training set without any regularization
D. Decrease the minimum number of samples required to split an internal node
Correct Answer: B. Apply post-pruning by removing branches that provide minimal
improvement on validation data
Rationale: The large gap between training accuracy (99%) and test accuracy (68%) is a clear indicator of
overfitting. The tree has memorized the training data rather than learning generalizable patterns. Post-
pruning (also called reduced error pruning) removes branches that do not significantly improve
classification accuracy on a held-out validation set, effectively simplifying the tree and improving
generalization. Increasing tree depth (A) would worsen overfitting. Using a larger training set (C) alone
does not prevent overfitting without regularization. Decreasing the minimum samples for splitting (D)
creates deeper, more complex trees, which exacerbates overfitting.



3. Which activation function is most commonly used in the hidden layers of deep neural
networks to mitigate the vanishing gradient problem?
A. Sigmoid function: sigma(x) = 1 / (1 + exp(-x))
B. Hyperbolic tangent: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
C. Rectified Linear Unit (ReLU): f(x) = max(0, x)
D. Softmax function: softmax(xi) = exp(xi) / Sum(exp(xj))
Correct Answer: C. Rectified Linear Unit (ReLU): f(x) = max(0, x)
Rationale: The ReLU activation function (f(x) = max(0, x)) is the default choice for hidden layers in deep
networks because it helps mitigate the vanishing gradient problem. Unlike sigmoid and tanh, which
saturate for large absolute inputs (gradients approach zero), ReLU maintains a constant gradient of 1 for
all positive inputs, allowing gradients to flow more effectively through many layers during
backpropagation. Sigmoid (A) and tanh (B) both suffer from vanishing gradients in deep networks because
their derivatives are bounded above by 0.25 and 1.0 respectively. Softmax (D) is used in the output layer
for multi-class classification, not in hidden layers.

1

, CS7641 Machine Learning — Final Exam Study Set 2026/2027




4. In the backpropagation algorithm, the weight update rule for a weight w_ij connecting
neuron j to neuron i is given by w_ij = w_ij - eta * partial_derivative(L /
partial_derivative(w_ij)). Which component controls the step size of each update?
A. The learning rate eta
B. The loss function L
C. The activation function output
D. The momentum coefficient
Correct Answer: A. The learning rate eta
Rationale: The learning rate (eta, also written as alpha) is a hyperparameter that controls the magnitude
of each weight update. A large learning rate causes larger steps but may overshoot minima and cause
divergence; a small learning rate leads to slow convergence but more stable updates. The loss function L
(B) defines the objective being minimized but does not directly control step size. The activation function (C)
transforms neuron outputs and affects gradient magnitude but is not the step size controller. The
momentum coefficient (D) is an optional addition that accelerates convergence in relevant directions by
accumulating past gradients.



5. Select all that apply. Which of the following are true statements about the Universal
Approximation Theorem? [Select All That Apply]
A. It guarantees that a single hidden layer feedforward neural network with a finite number of neurons
can approximate any continuous function on a compact subset of R^n
B. It proves that deeper networks always achieve better generalization than shallow networks
C. It requires the hidden layer to use a non-constant, bounded, and continuous activation function
D. It provides a constructive method for determining the optimal number of hidden neurons
E. It does not guarantee that the network can learn the approximation from training data in practice
Correct Answer: A. It guarantees that a single hidden layer feedforward neural network with
a finite number of neurons can approximate any continuous function on a compact subset of
R^n, C. It requires the hidden layer to use a non-constant, bounded, and continuous
activation function, E. It does not guarantee that the network can learn the approximation
from training data in practice
Rationale: The Universal Approximation Theorem (Cybenko 1989, Hornik 1991) states that a
feedforward network with a single hidden layer containing a finite number of neurons can approximate
any continuous function on a compact subset of R^n to arbitrary precision (A). It requires a non-
polynomial activation function that is continuous and bounded (C), such as the sigmoid function. However,
the theorem is an existence result (E) — it does not specify how many neurons are needed, how to find the
right weights, or whether gradient-based training will converge to the desired approximation. Option B is
false: the theorem does not claim deeper networks generalize better; it states that a shallow network can
approximate any function in principle. Option D is false: the theorem provides no constructive guidance on
architecture selection.



6. Which of the following correctly describes the Gini impurity index used in decision tree
algorithms such as CART?
A. Gini = 1 - Sum(pi^2) for all classes i, where pi is the probability of class i
B. Gini = -Sum(pi * log2(pi)) for all classes i, where pi is the probability of class i
C. Gini = Sum(pi * (1 - pi)) for all classes i, where pi is the proportion of class i
D. Both A and C are correct formulations of Gini impurity
Correct Answer: D. Both A and C are correct formulations of Gini impurity
Rationale: Both formulations are mathematically equivalent. The Gini impurity is defined as Gini = 1 -
Sum(pi^2), which expands to Sum(pi - pi^2) = Sum(pi * (1 - pi)). When all instances belong to a single class
(pure node), Gini = 0. When instances are evenly distributed across classes, Gini reaches its maximum
value. Option B describes entropy, not Gini impurity. CART uses Gini impurity (or Gini index) as the
default splitting criterion for classification tasks, selecting the split that minimizes the weighted Gini
impurity of the child nodes.

2

, CS7641 Machine Learning — Final Exam Study Set 2026/2027




7. A data scientist builds a decision tree to predict whether a loan application will be
approved. The tree has 47 leaf nodes and achieves 100% training accuracy but only 72% test
accuracy. The dataset has 5,000 samples with 20 features. Which approach is MOST likely to
improve test performance?
A. Adding 50 more features from external data sources
B. Limiting the maximum tree depth to 8 and setting a minimum of 20 samples per leaf
C. Using only the root node as a single-node tree (decision stump)
D. Retraining with the same parameters but using 10-fold cross-validation
Correct Answer: B. Limiting the maximum tree depth to 8 and setting a minimum of 20
samples per leaf
Rationale: This scenario demonstrates severe overfitting: 47 leaf nodes with 5,000 training samples
means the tree is highly complex, and perfect training accuracy with poor test accuracy confirms this.
Limiting tree depth to 8 and requiring a minimum of 20 samples per leaf constrains the tree's complexity,
forcing it to learn more generalizable patterns. Adding more features (A) would likely worsen overfitting
by giving the tree more opportunities to memorize noise. A decision stump (C) would likely underfit and
produce high bias. Cross-validation (D) helps estimate generalization but does not change the model itself;
it is an evaluation strategy, not a regularization technique.



8. What is the primary cause of the vanishing gradient problem in deep neural networks
using sigmoid activation functions?
A. The sigmoid function has a maximum output value of 1, causing numerical overflow
B. The derivative of the sigmoid function is bounded by 0.25, causing gradient magnitudes to shrink
exponentially as they are propagated backward through many layers
C. The sigmoid function is non-differentiable at x = 0
D. The sigmoid function produces negative outputs for negative inputs, canceling gradients
Correct Answer: B. The derivative of the sigmoid function is bounded by 0.25, causing
gradient magnitudes to shrink exponentially as they are propagated backward through many
layers
Rationale: The sigmoid function's derivative is sigma(x) * (1 - sigma(x)), which has a maximum value of
0.25 at x = 0 and approaches 0 for large positive or negative inputs. When computing gradients via the
chain rule during backpropagation, these small derivatives are multiplied across many layers, causing the
gradient to shrink exponentially (vanish). This makes it extremely difficult for early layers to learn
meaningful representations. Option A is incorrect because the sigmoid is bounded, preventing overflow.
Option C is incorrect because the sigmoid is differentiable everywhere. Option D is incorrect because the
sigmoid always outputs positive values between 0 and 1.



9. In a multi-layer perceptron (MLP), what is the purpose of having hidden layers with non-
linear activation functions?
A. To reduce the computational complexity of the forward pass
B. To enable the network to learn complex, non-linear decision boundaries and representations
C. To automatically perform feature selection by setting some weights to zero
D. To convert the network output into a probability distribution
Correct Answer: B. To enable the network to learn complex, non-linear decision boundaries
and representations
Rationale: Hidden layers with non-linear activation functions are what give neural networks their
representational power. Without non-linearity, stacking multiple linear layers would be equivalent to a
single linear transformation, severely limiting the types of functions the network can represent. Non-linear
activations allow the network to approximate complex, non-linear mappings from inputs to outputs.
Option A is incorrect because hidden layers increase computational cost. Option C describes the effect of
regularization techniques like L1, not hidden layers. Option D describes the purpose of the softmax function
in the output layer.


3

Geschreven voor

Instelling
CS7641 Machine Learning
Vak
CS7641 Machine Learning

Documentinformatie

Geüpload op
27 april 2026
Aantal pagina's
25
Geschreven in
2025/2026
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$15.50
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
BestSellerStuvia Chamberlain College Of Nursing
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
4317
Lid sinds
5 jaar
Aantal volgers
2067
Documenten
5515
Laatst verkocht
1 uur geleden
BestSellerStuvia

Welcome to BESTSELLERSTUVIA, your ultimate destination for high-quality, verified study materials trusted by students, educators, and professionals across the globe. We specialize in providing A+ graded exam files, practice questions, complete study guides, and certification prep tailored to a wide range of academic and professional fields. Whether you're preparing for nursing licensure (NCLEX, ATI, HESI, ANCC, AANP), healthcare certifications (ACLS, BLS, PALS, PMHNP, AGNP), standardized tests (TEAS, HESI, PAX, NLN), or university-specific exams (WGU, Portage Learning, Georgia Tech, and more), our documents are 100% correct, up-to-date for 2025/2026, and reviewed for accuracy. What makes BESTSELLERSTUVIA stand out: ✅ Verified Questions & Correct Answers

Lees meer Lees minder
3.6

616 beoordelingen

5
257
4
108
3
122
2
28
1
101

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen