Instructor: Jaskirat Singh
October 7, 2024
LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) are types of Recurrent
Neural Networks (RNNs) designed to address the limitations of traditional RNNs. In this ex-
planation, I’ll cover the motivation behind LSTMs and GRUs, their internal architectures, and
their differences and similarities, all within the context of improving sequential learning.
1 Recurrent Neural Networks (RNNs) Recap
RNNs are a type of neural network specifically designed for sequential data, such as time series,
natural language, or any data where previous inputs can provide context for the current output.
They work by using hidden states that loop back on themselves, allowing information to persist
from one step in the sequence to the next.
The standard RNN has the following limitations:
• Vanishing and Exploding Gradients: During backpropagation, the gradients used to
update the model parameters can become very small (vanish) or excessively large (explode),
leading to ineffective training. This is especially problematic for long sequences, making it
difficult for RNNs to learn long-term dependencies.
• Short-term Memory: Traditional RNNs tend to “forget” long-term dependencies, which
limits their capacity to remember information beyond a few steps in the past.
To overcome these issues, LSTMs and GRUs were developed. They incorporate mechanisms
that make it easier to retain long-term dependencies, improving the performance of RNNs for
tasks that require memory over long time periods.
2 Long Short-Term Memory (LSTM)
LSTM is a type of RNN that was introduced by Sepp Hochreiter and Jürgen Schmidhuber in
1997. The key idea behind LSTMs is to use gates and a special type of memory cell that helps
regulate the flow of information, thus mitigating the vanishing gradient problem and making it
easier to learn long-term dependencies.
1