What is Context Window?

Models

Context Window

The maximum number of tokens an LLM can process in a single request, determining how much text it can "see" at once.

Definition

The context window defines how much input (conversation history, documents, code) an LLM can process at once, measured in tokens. Early GPT-3 had a 4K token window (~3,000 words); frontier models now support 128K-2M tokens (books, entire codebases).

Context window size matters enormously for real-world applications: analysing a full legal contract, maintaining a long conversation, processing an entire codebase, or reading a research paper. Larger windows enable richer context but increase compute cost quadratically with vanilla attention (O(n²) in sequence length).

Technologies like sliding window attention, sparse attention, and linear attention approximations extend practical context length. Beyond a certain size, performance often degrades in the middle of long contexts ("lost in the middle" phenomenon), an active research area.

Examples

GPT-4 Turbo (128K tokens)
Claude 3 (200K tokens)
Gemini 1.5 Pro (1M tokens)

Related Terms

Large Language Model (LLM)

A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.

Transformer

A neural network architecture using self-attention to process sequences in parallel — the foundation of all modern LLMs.

Tokenization

The process of splitting text into tokens (subword units) for processing by language models.

Explore

← All glossary terms AI concept guides AI timeline Browse companies