CS-7643 Quiz 4 Exam – Deep
Learning Optimization &
Regularization Study Guide
Embedding - ANSWER-A learned map from entities to vectors that encodes
similarity
Graph Embedding - ANSWER-Optimize the objective that connected nodes have
more similar embeddings than unconnected nodes.
Task: convert nodes to vectors
- effectively unsupervised learning where nearest neighbors are similar
- these learned vectors are useful for downstream tasks
Multi-layer Perceptron (MLP) pain points for NLP - ANSWER-- Cannot easily
support variable-sized sequences as inputs or outputs
- No inherent temporal structure
- No practical way of holding state
- The size of the network grows with the maximum allowed size of the input or
output sequences
, Truncated Backpropagation through time - ANSWER-- Only backpropagate a RNN
through T time steps
Recurrent Neural Networks (RNN) - ANSWER-h(t) = activation(U*input + V*h(t-1) +
bias)
y(t) = activation(W*h(t) + bias)
- activation is typically the logistic function or tanh
- outputs can also simply be h(t)
- family of NN architectures for modeling sequences
Training Vanilla RNN's difficulties - ANSWER-- Vanishing gradients
- Since dx(t)/dx(t-1) = w^t
- if w > 1: exploding gradients
- if w < 1: vanishing gradients
Long Short-Term Memory Network Gates and States - ANSWER-- f(t) = forget gate
- i(t) = input gate
- u(t) = candidate update gate
- o(t) = output gate
- c(t) = cell state
- c(t) = f(t) * c(t - 1) + i(t) * u(t)
Learning Optimization &
Regularization Study Guide
Embedding - ANSWER-A learned map from entities to vectors that encodes
similarity
Graph Embedding - ANSWER-Optimize the objective that connected nodes have
more similar embeddings than unconnected nodes.
Task: convert nodes to vectors
- effectively unsupervised learning where nearest neighbors are similar
- these learned vectors are useful for downstream tasks
Multi-layer Perceptron (MLP) pain points for NLP - ANSWER-- Cannot easily
support variable-sized sequences as inputs or outputs
- No inherent temporal structure
- No practical way of holding state
- The size of the network grows with the maximum allowed size of the input or
output sequences
, Truncated Backpropagation through time - ANSWER-- Only backpropagate a RNN
through T time steps
Recurrent Neural Networks (RNN) - ANSWER-h(t) = activation(U*input + V*h(t-1) +
bias)
y(t) = activation(W*h(t) + bias)
- activation is typically the logistic function or tanh
- outputs can also simply be h(t)
- family of NN architectures for modeling sequences
Training Vanilla RNN's difficulties - ANSWER-- Vanishing gradients
- Since dx(t)/dx(t-1) = w^t
- if w > 1: exploding gradients
- if w < 1: vanishing gradients
Long Short-Term Memory Network Gates and States - ANSWER-- f(t) = forget gate
- i(t) = input gate
- u(t) = candidate update gate
- o(t) = output gate
- c(t) = cell state
- c(t) = f(t) * c(t - 1) + i(t) * u(t)