and Answers | 2025 Update | 100%
Correct-GT
Georgia Institute of Technology – Deep Learning Course
Official Exam Blueprint – 2025 Update
Exam Specifications
Section Questions
Generative Models (GANs, VAEs, Diffusion, 12
Flows, Metrics)
Attention Mechanisms & Transformers 11
Reinforcement Learning 10
Advanced Optimization 8
Unsupervised & Self-Supervised Learning 4
Interpretability & Explainability 4
SECTION 1: GENERATIVE MODELS (12 Questions)
Q1: In the standard GAN minimax objective V(D,G) = E_x[log D(x)] + E_z[log(1 - D(G(z)))], what
does the discriminator D attempt to maximize during training?
• A. The probability that generated samples G(z) are classified as real
• B. The probability that real samples x are classified as fake
• C. The probability that real samples are classified as real and generated samples are
classified as fake
• D. The KL divergence between the real and generated data distributions
Correct Answer: C
Rationale: Correct because the discriminator D aims to maximize the probability of correctly
classifying real samples as real (log D(x)) and generated samples as fake (log(1 - D(G(z)))), which
corresponds to maximizing the overall objective V(D,G).
Q2: Which of the following is a primary cause of mode collapse in GAN training?
,• A. The discriminator learning too slowly relative to the generator
• B. The generator discovering a small subset of samples that consistently fool the
discriminator
• C. Using spectral normalization in the discriminator architecture
• D. Applying gradient penalty to the generator loss
Correct Answer: B
Rationale: Correct because mode collapse occurs when the generator learns to produce a
limited variety of outputs that the discriminator cannot distinguish from real data, causing the
generator to collapse to a few modes of the data distribution.
Q3: In a Wasserstein GAN (WGAN), what is the primary motivation for replacing the Jensen-
Shannon divergence with the Wasserstein (Earth Mover) distance?
• A. To enable the use of batch normalization in the generator
• B. To provide a meaningful loss signal even when the generator and discriminator
distributions have little or no overlap
• C. To reduce the number of training parameters in the network
• D. To eliminate the need for a discriminator network entirely
Correct Answer: B
Rationale: Correct because the Wasserstein distance provides a continuous and differentiable
metric that remains meaningful even when distributions are disjoint, unlike the Jensen-Shannon
divergence which saturates and provides vanishing gradients in such cases.
Q4: What is the primary purpose of the gradient penalty in WGAN-GP compared to weight
clipping in the original WGAN?
• A. To increase the training speed by a factor of two
• B. To enforce the Lipschitz constraint more effectively by penalizing the gradient norm of
the critic
• C. To remove the need for the critic network
• D. To convert the WGAN into a conditional GAN
Correct Answer: B
Rationale: Correct because WGAN-GP replaces weight clipping with a gradient penalty that
penalizes deviations of the critic's gradient norm from 1, enforcing the Lipschitz constraint more
naturally and avoiding the capacity reduction issues associated with weight clipping.
, Q5: In a Variational Autoencoder (VAE), what does the reparameterization trick enable?
• A. Direct optimization of the discrete latent variables using gradient descent
• B. Backpropagation through the stochastic sampling operation z = μ + σ * ε, where ε ~
N(0,1)
• C. Elimination of the KL divergence term from the ELBO
• D. Conversion of the VAE into a deterministic autoencoder
Correct Answer: B
Rationale: Correct because the reparameterization trick rewrites the sampling operation as a
deterministic function of the parameters (μ, σ) and a fixed noise variable ε, allowing gradients to
flow through the sampling operation during backpropagation.
Q6: Which of the following best describes the trade-off between VAEs and GANs in terms of
sample quality and diversity?
• A. VAEs produce sharper samples but with less diversity; GANs produce blurrier samples
with more diversity
• B. VAEs produce blurrier but more diverse samples; GANs produce sharper but less diverse
samples
• C. Both VAEs and GANs produce equally sharp and diverse samples
• D. VAEs cannot generate new samples, only reconstruct inputs
Correct Answer: B
Rationale: Correct because VAEs optimize a likelihood-based objective that encourages covering
the full data distribution, resulting in diverse but sometimes blurry samples, while GANs
optimize for sample quality through adversarial training, often producing sharper but less
diverse outputs.
Q7: In a denoising diffusion probabilistic model (DDPM), what occurs during the forward
diffusion process?
• A. The model iteratively denoises a latent representation to generate a clean image
• B. Gaussian noise is progressively added to the data over a series of timesteps until the
data becomes pure noise
• C. The model learns to directly map random noise to a data sample in a single step
• D. The discriminator evaluates the quality of generated samples at each timestep