What is Attention Mechanism?

Techniques

Attention Mechanism

A neural network component that lets models dynamically focus on relevant parts of the input when generating each output token.

Definition

The attention mechanism, introduced in 2014 for machine translation and generalised to self-attention in the Transformer (2017), allows a model to weight the importance of different input positions when producing each output. Instead of compressing the entire input into a fixed vector (as RNNs do), attention lets the model "look back" at relevant parts of the input at each decoding step.

Self-attention (each position attends to all others in the same sequence) is the core Transformer operation. Multi-head attention runs several attention functions in parallel, allowing the model to attend to different aspects simultaneously — e.g., syntactic vs. semantic relationships.

Attention is what enables LLMs to understand long-range dependencies (knowing "it" refers to a noun mentioned many sentences ago), and is a key reason Transformers outperformed RNNs on language tasks.

Examples

Transformer self-attention in GPT-4
Cross-attention in DALL-E (text→image)
BERT's bidirectional attention

Related Terms

Transformer

A neural network architecture using self-attention to process sequences in parallel — the foundation of all modern LLMs.

Neural Network

A computing system of interconnected nodes inspired by biological brains, trained to recognise patterns.

Large Language Model (LLM)

A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.

Explore

← All glossary terms AI concept guides AI timeline Browse companies