Techniques

Pre-training

The initial phase of training a large model on massive, general datasets before task-specific fine-tuning.

Definition

Pre-training is the expensive, large-scale initial training phase where a model learns broad representations from billions or trillions of data points. For LLMs, this means predicting the next token on internet-scale text; for vision models, it may mean contrastive learning (CLIP), masked image modelling (MAE), or ImageNet classification.

Pre-training is compute-intensive (training frontier LLMs costs tens to hundreds of millions of dollars) but produces a powerful initialisation that can be quickly adapted to many downstream tasks via fine-tuning. This "pre-train once, fine-tune many" paradigm has proven enormously economical.

Foundation models are the output of large-scale pre-training: models capable of many tasks that neither the training data nor the training objective explicitly targeted.

Examples

  • GPT-4 pre-training on Common Crawl
  • CLIP pre-training on image-text pairs
  • BERT masked language modelling