OL Unit Quiz
CS 7641: Machine Learning
Solutions
DO NOT DISTRIBUTE OUTSIDE OF CS7641
1 Section 1 - Randomized Optimization
Part 1. Genetic Algorithms (MCMA)
Which statements about GAs are true? Select all that apply.
a) Typical loop: initialize population to evaluate fitness to select parents to crossover/mutation
to form next generation.
b) Elitism copies the top few individuals unchanged to preserve best-so-far fitness.
c) Excessive selection pressure can cause premature convergence and loss of diversity.
d) GA encodings must be binary; real-valued representations are invalid.
e) Roulette-wheel selection always dominates tournament selection.
f) GAs require a convex objective for convergence guarantees.
Answer key: a, b, c.
Part 2. MIMIC: Model Building & Sampling (MCMA)
Which statements describe MIMIC correctly? Select all that apply.
a) Select an elite set, fit a probabilistic model to those elites, and sample new candidates from
that model.
b) Uses a tree-structured distribution learned via a maximum-weight spanning tree on pairwise
mutual information (Chow–Liu).
c) If you assume full independence across variables, MIMIC reduces to an EDA with factorized
marginals.
d) MIMIC requires gradients to update its model parameters.
e) The dependency structure is learned from the elite samples, biasing search toward promising
regions.
f) If the learned tree is imperfect, MIMIC can never sample better solutions.
Answer key: a, b, c, e.
1
, Part 3. Simulated Annealing (Numerical)
A simulated annealing (SA) optimizer uses the Metropolis rule: an uphill move with cost increase
∆E > 0 at temperature T is accepted with probability exp(−∆E/T ); a downhill move (∆E < 0)
is accepted with probability 1. Unless stated otherwise, assume independent proposals.
(a): Single uphill move at fixed temperature At temperature T = 2.5, the algorithm
proposes a single uphill move with ∆E = 1.2. Question: What is the acceptance probability?
Answer (number): 0.619
Derivation: p = exp(−∆E/T ) = exp(−1.2/2.5) = exp(−0.48) ≈ 0.619.
(b): Geometric cooling and acceptance SA uses geometric cooling Tk = T0 α k with T0 = 10,
α = 0.9. After k = 20 temperature drops, an uphill move with ∆E = 0.8 is proposed. Question:
What is the acceptance probability at step k = 20?
Answer (number): 0.518
Derivation: T20 = 10·0.920 ≈ 10·0.1215767 = 1.215767. p = exp(−0.8/T20 ) = exp(−0.8/1.215767) ≈
exp(−0.6587) ≈ 0.518.
(c): Multiple independent proposals at the same temperature At the same temperature
as in Part (b), the algorithm makes m = 5 independent uphill proposals, each with the same
∆E = 0.8. Question: What is the probability that at least one of these five proposals is accepted?
Answer (number): 0.974
Derivation: From Part (b), single-trial acceptance p ≈ 0.518. P (at least one accept) = 1−(1−p)5 ≈
1 − (1 − 0.518)5 = 1 − (0.482)5 ≈ 1 − 0.026 ≈ 0.974.
Note: All numbers rounded to three decimal places.
2
CS 7641: Machine Learning
Solutions
DO NOT DISTRIBUTE OUTSIDE OF CS7641
1 Section 1 - Randomized Optimization
Part 1. Genetic Algorithms (MCMA)
Which statements about GAs are true? Select all that apply.
a) Typical loop: initialize population to evaluate fitness to select parents to crossover/mutation
to form next generation.
b) Elitism copies the top few individuals unchanged to preserve best-so-far fitness.
c) Excessive selection pressure can cause premature convergence and loss of diversity.
d) GA encodings must be binary; real-valued representations are invalid.
e) Roulette-wheel selection always dominates tournament selection.
f) GAs require a convex objective for convergence guarantees.
Answer key: a, b, c.
Part 2. MIMIC: Model Building & Sampling (MCMA)
Which statements describe MIMIC correctly? Select all that apply.
a) Select an elite set, fit a probabilistic model to those elites, and sample new candidates from
that model.
b) Uses a tree-structured distribution learned via a maximum-weight spanning tree on pairwise
mutual information (Chow–Liu).
c) If you assume full independence across variables, MIMIC reduces to an EDA with factorized
marginals.
d) MIMIC requires gradients to update its model parameters.
e) The dependency structure is learned from the elite samples, biasing search toward promising
regions.
f) If the learned tree is imperfect, MIMIC can never sample better solutions.
Answer key: a, b, c, e.
1
, Part 3. Simulated Annealing (Numerical)
A simulated annealing (SA) optimizer uses the Metropolis rule: an uphill move with cost increase
∆E > 0 at temperature T is accepted with probability exp(−∆E/T ); a downhill move (∆E < 0)
is accepted with probability 1. Unless stated otherwise, assume independent proposals.
(a): Single uphill move at fixed temperature At temperature T = 2.5, the algorithm
proposes a single uphill move with ∆E = 1.2. Question: What is the acceptance probability?
Answer (number): 0.619
Derivation: p = exp(−∆E/T ) = exp(−1.2/2.5) = exp(−0.48) ≈ 0.619.
(b): Geometric cooling and acceptance SA uses geometric cooling Tk = T0 α k with T0 = 10,
α = 0.9. After k = 20 temperature drops, an uphill move with ∆E = 0.8 is proposed. Question:
What is the acceptance probability at step k = 20?
Answer (number): 0.518
Derivation: T20 = 10·0.920 ≈ 10·0.1215767 = 1.215767. p = exp(−0.8/T20 ) = exp(−0.8/1.215767) ≈
exp(−0.6587) ≈ 0.518.
(c): Multiple independent proposals at the same temperature At the same temperature
as in Part (b), the algorithm makes m = 5 independent uphill proposals, each with the same
∆E = 0.8. Question: What is the probability that at least one of these five proposals is accepted?
Answer (number): 0.974
Derivation: From Part (b), single-trial acceptance p ≈ 0.518. P (at least one accept) = 1−(1−p)5 ≈
1 − (1 − 0.518)5 = 1 − (0.482)5 ≈ 1 − 0.026 ≈ 0.974.
Note: All numbers rounded to three decimal places.
2