CS7643 QUIZ 5 QUESTIONS WITH DETAILED
VERIFIED ANSWERS (100% CORRECT
ANSWERS) /ALREADY GRADED
Properties of the softmax function
1.) Softmax is permutation equivariant (permutation of input leads to the same permutation of
output)
2.) Softmax is not linear
3.) Softmax is differentiable
4.) Softmax interpolates between a distribution that selects index uniformly at random and dist
that selects the argmax index with probability 1.
Attention
weighting or probability distribution over inputs that depends on computational state and inputs
what does softmax allow?
allows random sampling of numbers from original set with a distribution that depends on the
values from the numbers of the original set.
Softmax Attention
- Given a set. of vectors "U" and a query vector "Q"
- We can select the most similar vector to "Q" via p=softmax(U * Q)
what does Softmax attention do?
Gives a method of differentially selecting a vector from a set.
, Forms of attention
hard and soft
Hard Attention
samples are drawn from the distribution over input
Soft Attention
summary over inputs by using distribution as weighted average.
The encoder-decoder model and how it's used by recurrent neural networks for seq2seq
prediction problems.
1.) Encoder RNN produces an encoding of source sentence -> passed to decoder RNN
2.) Decoder RNN is a language model that generates target sentence, conditioned on encoding
Beam search algorithm and how it works in seq2seq models using condition probability.
-An algorithm that approximates the argmax for machine translation. The argmax needs to be
approximated because the exponential space is intractable. However, this algorithm is able to
search exponential space in linear time.
-Beam size "k" determines the width of the search
How is machine translation modeled?
Translation is often modeled as a conditional language model.
Transfomers are known for?
conditional language models
VERIFIED ANSWERS (100% CORRECT
ANSWERS) /ALREADY GRADED
Properties of the softmax function
1.) Softmax is permutation equivariant (permutation of input leads to the same permutation of
output)
2.) Softmax is not linear
3.) Softmax is differentiable
4.) Softmax interpolates between a distribution that selects index uniformly at random and dist
that selects the argmax index with probability 1.
Attention
weighting or probability distribution over inputs that depends on computational state and inputs
what does softmax allow?
allows random sampling of numbers from original set with a distribution that depends on the
values from the numbers of the original set.
Softmax Attention
- Given a set. of vectors "U" and a query vector "Q"
- We can select the most similar vector to "Q" via p=softmax(U * Q)
what does Softmax attention do?
Gives a method of differentially selecting a vector from a set.
, Forms of attention
hard and soft
Hard Attention
samples are drawn from the distribution over input
Soft Attention
summary over inputs by using distribution as weighted average.
The encoder-decoder model and how it's used by recurrent neural networks for seq2seq
prediction problems.
1.) Encoder RNN produces an encoding of source sentence -> passed to decoder RNN
2.) Decoder RNN is a language model that generates target sentence, conditioned on encoding
Beam search algorithm and how it works in seq2seq models using condition probability.
-An algorithm that approximates the argmax for machine translation. The argmax needs to be
approximated because the exponential space is intractable. However, this algorithm is able to
search exponential space in linear time.
-Beam size "k" determines the width of the search
How is machine translation modeled?
Translation is often modeled as a conditional language model.
Transfomers are known for?
conditional language models