What is Pre-training?

Techniques

Pre-training

The initial phase of training a large model on massive, general datasets before task-specific fine-tuning.

Definition

Pre-training is the expensive, large-scale initial training phase where a model learns broad representations from billions or trillions of data points. For LLMs, this means predicting the next token on internet-scale text; for vision models, it may mean contrastive learning (CLIP), masked image modelling (MAE), or ImageNet classification.

Pre-training is compute-intensive (training frontier LLMs costs tens to hundreds of millions of dollars) but produces a powerful initialisation that can be quickly adapted to many downstream tasks via fine-tuning. This "pre-train once, fine-tune many" paradigm has proven enormously economical.

Foundation models are the output of large-scale pre-training: models capable of many tasks that neither the training data nor the training objective explicitly targeted.

Examples

GPT-4 pre-training on Common Crawl
CLIP pre-training on image-text pairs
BERT masked language modelling

Related Terms

Fine-tuning

Continuing training of a pre-trained model on domain-specific data to specialise it for a particular task.

Transfer Learning

Reusing knowledge from a model trained on one task to improve learning on a different but related task.

Large Language Model (LLM)

A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.

Foundation Model

A large model trained on broad data that can be adapted to many downstream tasks via fine-tuning or prompting.

Explore

← All glossary terms AI concept guides AI timeline Browse companies