Pre-training
The initial phase of training a large model on massive, general datasets before task-specific fine-tuning.
Definition
Pre-training is the expensive, large-scale initial training phase where a model learns broad representations from billions or trillions of data points. For LLMs, this means predicting the next token on internet-scale text; for vision models, it may mean contrastive learning (CLIP), masked image modelling (MAE), or ImageNet classification.
Pre-training is compute-intensive (training frontier LLMs costs tens to hundreds of millions of dollars) but produces a powerful initialisation that can be quickly adapted to many downstream tasks via fine-tuning. This "pre-train once, fine-tune many" paradigm has proven enormously economical.
Foundation models are the output of large-scale pre-training: models capable of many tasks that neither the training data nor the training objective explicitly targeted.
Examples
- GPT-4 pre-training on Common Crawl
- CLIP pre-training on image-text pairs
- BERT masked language modelling
Related Terms
Fine-tuning
Continuing training of a pre-trained model on domain-specific data to specialise it for a particular task.
Transfer Learning
Reusing knowledge from a model trained on one task to improve learning on a different but related task.
Large Language Model (LLM)
A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.
Foundation Model
A large model trained on broad data that can be adapted to many downstream tasks via fine-tuning or prompting.