Large Language Model (LLM)
A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.
Definition
Large Language Models are neural networks (typically Transformer-based) with billions to trillions of parameters, trained on internet-scale text corpora. They predict the next token in a sequence, and through this simple objective on enough data, they acquire broad linguistic and world knowledge.
The "large" matters: smaller models memorise patterns; large models develop emergent capabilities — few-shot learning, reasoning, code generation, translation — without being explicitly trained on those tasks. GPT-3's 175B parameters in 2020 was a breakthrough; current frontier models likely exceed a trillion.
Post-training alignment via RLHF and Constitutional AI converts a raw language model into a helpful assistant. LLMs are now the foundation for most AI applications: search, code generation, document analysis, customer service, and AI agents.
Examples
- GPT-4 (OpenAI)
- Claude (Anthropic)
- Gemini (Google)
- LLaMA (Meta)
- Mistral
Want a deeper dive?
Read our full explainer with use cases, how-it-works, and FAQs.
Large Language Models (LLMs) concept guideCompanies using this
Related Terms
Transformer
A neural network architecture using self-attention to process sequences in parallel — the foundation of all modern LLMs.
Generative AI
AI systems that create new content — text, images, audio, video, code — by learning patterns from training data.
Prompt Engineering
The practice of crafting effective text inputs to guide LLMs toward desired outputs.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where human preference ratings guide language model fine-tuning to produce more helpful, harmless outputs.
Context Window
The maximum number of tokens an LLM can process in a single request, determining how much text it can "see" at once.
Hallucination
When an AI model generates plausible-sounding but factually incorrect or fabricated information.
Fine-tuning
Continuing training of a pre-trained model on domain-specific data to specialise it for a particular task.