Context Window
The maximum number of tokens an LLM can process in a single request, determining how much text it can "see" at once.
Definition
The context window defines how much input (conversation history, documents, code) an LLM can process at once, measured in tokens. Early GPT-3 had a 4K token window (~3,000 words); frontier models now support 128K-2M tokens (books, entire codebases).
Context window size matters enormously for real-world applications: analysing a full legal contract, maintaining a long conversation, processing an entire codebase, or reading a research paper. Larger windows enable richer context but increase compute cost quadratically with vanilla attention (O(n²) in sequence length).
Technologies like sliding window attention, sparse attention, and linear attention approximations extend practical context length. Beyond a certain size, performance often degrades in the middle of long contexts ("lost in the middle" phenomenon), an active research area.
Examples
- GPT-4 Turbo (128K tokens)
- Claude 3 (200K tokens)
- Gemini 1.5 Pro (1M tokens)
Related Terms
Large Language Model (LLM)
A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.
Transformer
A neural network architecture using self-attention to process sequences in parallel — the foundation of all modern LLMs.
Tokenization
The process of splitting text into tokens (subword units) for processing by language models.