Class notes

Vanishing and Exploding Gradients in Recurrent Neural Networks

Rating

Sold

Pages

Uploaded on

22-02-2025

Written in

2024/2025

The vanishing and exploding gradient problem are significant challenges in training Recurrent Neural Networks (RNNs), particularly when working with deep networks or with long sequences of data. This issue arises primarily due to the way backpropagation is conducted through time, and it significantly affects the learning process of RNNs. Let’s explore each of these problems in detail.

Show more Read less

Institution

Course

Content preview

Vanishing and Exploding Gradients in Recurrent
Neural Networks
Instructor: Jaskirat Singh
October 7, 2024

The vanishing and exploding gradient problem are significant challenges in training
Recurrent Neural Networks (RNNs), particularly when working with deep networks or with
long sequences of data. This issue arises primarily due to the way backpropagation is conducted
through time, and it significantly affects the learning process of RNNs. Let’s explore each of
these problems in detail.

1 The Backpropagation Through Time (BPTT) in RNNs
To understand the vanishing and exploding gradient problems, it’s helpful to briefly understand
Backpropagation Through Time (BPTT), which is the algorithm used to train RNNs.
During BPTT, the RNN unfolds over time, effectively forming a very deep network where each
layer represents the RNN’s state at a different timestep.
In BPTT, the gradients of the loss with respect to the weights are computed by propagating
errors back through each time step. However, this process involves repeated multiplication by the
derivative of the activation function and the weight matrix, which can cause gradients to either
shrink exponentially to near zero or grow exponentially to very large values. This phenomenon
is at the root of both the vanishing and exploding gradient problems.

2 The Vanishing Gradient Problem
• Definition: The vanishing gradient problem occurs when gradients become exceedingly
small as they are propagated back through time. As a result, the early layers (or time steps)
receive little to no updates during training. This makes it extremely difficult for the network to
learn dependencies that occur far back in time, hindering the model’s ability to learn long-term
relationships.

• Mathematical Perspective: During backpropagation, each partial derivative involves a
term that often depends on the weights and the activation function’s derivative. Typically,
activation functions like sigmoid or tanh have derivatives in the range (0, 1), meaning that

1

Report Copyright Violation

Written for

Institution: Poornima University
Course: FCE

All documents for this subject (3)

Document information

Uploaded on: February 22, 2025
Number of pages: 4
Written in: 2024/2025
Type: Class notes
Professor(s): Unknown
Contains: All classes

Subjects

rnn
lstm

$8.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

shreyanshkhandelwal

Get to know the seller

shreyanshkhandelwal Poornima University

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller shreyanshkhandelwal. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Vanishing and Exploding Gradients in Recurrent Neural Networks

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?