Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

CS7643 QUIZ 4: RECURRENT NETWORKS, EMBEDDINGS & SEQUENCE MODELING

Rating
-
Sold
-
Pages
19
Grade
A+
Uploaded on
21-06-2026
Written in
2025/2026

This document contains study material and practice questions for CS7643 Quiz 4, focusing on recurrent neural networks, embeddings, and sequence modeling techniques in deep learning. Topics include recurrent network architectures, sequence processing, word embeddings, language modeling, long short-term memory (LSTM) networks, gated recurrent units (GRUs), sequence-to-sequence models, attention mechanisms, training challenges, and practical applications in natural language processing and time-series analysis. It is designed to help students prepare for quizzes and strengthen their understanding of sequence-based machine learning models.

Show more Read less
Institution
CS7643
Course
CS7643

Content preview

CS7643 QUIZ 4: RECURRENT NETWORKS,
EMBEDDINGS & SEQUENCE MODELING
SECTION A: RECURRENT NEURAL NETWORKS (10 Questions)

Q1: In a vanilla RNN with update rule h(t) = tanh(U·x(t) + V·h(t-1) + b), what is the
primary computational disadvantage during training?

A. The model requires O(T²) memory to store all intermediate hidden states.
B. The forward pass cannot be parallelized across time steps due to sequential
dependency. [CORRECT]

C. The backward pass can be fully parallelized using modern GPU architectures.
D. The number of parameters scales linearly with sequence length T.

Correct Answer: B

Rationale: Correct because the hidden state h(t) depends on h(t-1), forcing
sequential computation with runtime O(T) that cannot be parallelized across the
time dimension.
Q2: A vanilla RNN is trained on sequences of length T=100. Analysis shows that
gradients with respect to early time step inputs are approximately zero. What is
the most likely cause?

A. The learning rate is too high, causing gradient descent to oscillate.

B. The weight matrix V has spectral radius less than 1, causing vanishing
gradients. [CORRECT]
C. The activation function is ReLU rather than tanh.
D. The input dimension is larger than the hidden dimension.

Correct Answer: B

Rationale: Correct because the Jacobian ∂h(t)/∂h(t-1) involves repeated
multiplication by V; when the spectral radius of V is less than 1, gradients decay
exponentially as V^t, producing vanishing gradients for early time steps.
Q3: Which RNN architecture is most appropriate for sentiment classification,
where a single sentiment label must be produced for an input sentence of
variable length?
A. N-to-N architecture with one output per word.

,B. N-to-1 architecture that maps the final hidden state to a single output.
[CORRECT]

C. 1-to-N architecture that generates a sequence from a single input vector.

D. Encoder-decoder with attention over all intermediate states.

Correct Answer: B

Rationale: Correct because sentiment classification requires mapping a variable-
length input sequence to a single output label, which is precisely the N-to-1
architecture where the final hidden state encodes the entire sequence.

Q4: During training of a vanilla RNN, gradient norms suddenly spike to values
exceeding 1000. Which technique should be applied?
A. Reduce the learning rate by a factor of 10.

B. Apply gradient clipping to bound the maximum gradient norm. [CORRECT]

C. Switch from SGD to Adam optimizer immediately.

D. Increase the hidden state dimension to absorb larger gradients.
Correct Answer: B

Rationale: Correct because exploding gradients occur when the spectral radius of
recurrent weights exceeds 1; gradient clipping directly bounds the gradient norm
during backpropagation through time without modifying the architecture.

Q5: In teacher forcing during RNN training, what input is fed at time step t+1?
A. The model's own predicted output from time step t.

B. The ground-truth target value from the training data at time step t+1.
[CORRECT]

C. A weighted average of the prediction and ground truth.

D. The hidden state from time step t passed through the output layer.

Correct Answer: B

Rationale: Correct because teacher forcing uses the actual training data value as
the next input rather than the model's prediction, which emerges from maximum
likelihood estimation and prevents error accumulation during training.

Q6: A researcher replaces hidden-to-hidden recurrence with teacher forcing at
every time step during both training and inference. What is the primary
consequence?

, A. The model becomes unable to handle variable-length sequences.

B. The model can be parallelized across time steps but loses the ability to
propagate information through hidden states. [CORRECT]

C. The vanishing gradient problem is completely eliminated.

D. The model requires twice as many parameters as a standard RNN.

Correct Answer: B

Rationale: Correct because removing hidden-to-hidden recurrence eliminates the
sequential dependency chain, enabling parallelization, but the model loses the
recurrent path for propagating information across time steps, making it less
powerful than a true RNN.
Q7: Truncated backpropagation through time (BPTT) with truncation parameter
k=10 on sequences of length T=100 means:

A. Only the first 10 time steps are used in the forward pass.

B. Gradients are backpropagated through at most 10 time steps before
truncation. [CORRECT]

C. The hidden state is reset to zero every 10 time steps.

D. The model processes the sequence in 10 non-overlapping chunks.

Correct Answer: B

Rationale: Correct because truncated BPTT limits the temporal span of gradient
computation to k steps, approximating full BPTT while controlling computational
cost and mitigating vanishing/exploding gradients in long sequences.
Q8: Which of the following is NOT a valid criticism of using MLPs for NLP tasks
compared to RNNs?

A. MLPs cannot easily support variable-sized input sequences.

B. MLPs have no inherent mechanism for modeling temporal structure.

C. MLPs require network size to grow with maximum allowed sequence length.
D. MLPs suffer from vanishing gradients across time steps. [CORRECT]

Correct Answer: D

Rationale: Correct because vanishing gradients across time steps is a problem
specific to recurrent architectures with repeated weight multiplication; MLPs

Written for

Institution
CS7643
Course
CS7643

Document information

Uploaded on
June 21, 2026
Number of pages
19
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$15.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
ExamAceStuvia Rasmussen College
Follow You need to be logged in order to follow users or courses
Sold
37
Member since
9 months
Number of followers
0
Documents
926
Last sold
3 days ago
Top Grades By ExamAceStuvia

Ace Your Certification — The Smart Way! Welcome to ExamAceStuvia – the ultimate battle-tested exam prep platform built by passers, for future passers. Get thousands of real exam questions straight from people who just crushed the same test you’re facing. No fluff. No outdated dumps. Just authentic, up-to-date practice that feels exactly like the real thing. Why thousands choose Examice every day: 400+ published exams across 100+ top providers (AWS, Microsoft, Cisco, ,NCLEX , WGU , CompTIA, and many more) Whether you're preparing for nursing licensure (NCLEX, ATI, HESI, ANCC, AANP), healthcare certifications (ACLS, BLS, PALS, PMHNP, AGNP), standardized tests (TEAS, HESI, PAX, NLN), or university-specific exams (WGU, Portage Learning, Georgia Tech, and more), our documents are 100% correct, up-to-date for 2025/2026, and reviewed for accuracy.. Community-powered accuracy → open discussions, source-backed references, democratic voting & follow-up Q&A to lock in the real correct answers Realistic exam that builds confidence and exposes weak spots fast Most affordable premium prep in the industry – quality without breaking the bank Regular updates so you’re always studying what actually appears today Whether you're chasing that dream job, promotion, or career switch — ExamAce turns “I hope I pass” into “I’ve got this.” Join the community that’s already helped thousands certify. Try ExamAceStuvia today → pass tomorrow.

Read more Read less
3.9

7 reviews

5
4
4
0
3
2
2
0
1
1

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions