Architecture

Recurrent Neural Network (RNN)

A neural architecture that processes sequences by maintaining a hidden state across time steps.

Definition

RNNs process sequential data by maintaining a hidden state vector that summarises all previous inputs. At each time step, the hidden state is updated using the current input and previous state. LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units) extend basic RNNs with gating mechanisms that learn what to remember and forget, addressing the vanishing gradient problem.

RNNs were the dominant sequence model from ~2014-2017 for language modelling, speech recognition, and machine translation. However, Transformers largely replaced them due to: better handling of long-range dependencies, parallelisable training over sequences, and superior scaling with data and compute.

RNNs persist in some edge and real-time applications where transformer inference latency is prohibitive, and variants like Mamba (state space models) are exploring recurrence-based alternatives to transformers with linear inference complexity.

Examples

  • LSTM for time series
  • GRU for machine translation (pre-2017)
  • Seq2Seq models