Self-Supervised Learning
Learning representations from unlabelled data by creating supervisory signals from the data itself.
Definition
Self-supervised learning (SSL) creates labels automatically from the structure of the data itself, eliminating the need for expensive human annotation. The model is trained on a pretext task (predicting masked tokens, predicting image patches, contrastive learning) that doesn't require human labels but forces the model to learn useful representations.
Language model pre-training (next-token prediction, masked language modelling) is a form of SSL. For vision, methods include MAE (Masked Autoencoder), DINO, and SimCLR. CLIP uses contrastive SSL on paired image-text data from the internet.
SSL has proven that representations learned without labels can be remarkably general — transferring to downstream tasks with minimal fine-tuning. It is the foundation of the foundation model paradigm and has largely displaced supervised pre-training for language and is challenging it in vision.
Examples
- GPT pre-training (predict next token)
- BERT (mask and predict)
- CLIP (contrastive image-text)
- DINO (self-distillation)
Related Terms
Pre-training
The initial phase of training a large model on massive, general datasets before task-specific fine-tuning.
Foundation Model
A large model trained on broad data that can be adapted to many downstream tasks via fine-tuning or prompting.
Transfer Learning
Reusing knowledge from a model trained on one task to improve learning on a different but related task.
Deep Learning
A subset of machine learning using neural networks with many layers to learn complex hierarchical representations.