BERT
Google's 2018 bidirectional encoder model that transformed NLP by learning contextual word representations from unlabelled text.
Definition
BERT (Bidirectional Encoder Representations from Transformers) was released by Google in 2018 and marked a step-change in NLP performance. Unlike previous models that processed text left-to-right, BERT uses a masked language modelling objective (predicting randomly masked tokens) to learn bidirectional context — understanding how each word is influenced by both surrounding words.
Pre-trained on Wikipedia and BookCorpus, BERT achieved state-of-the-art on 11 NLP benchmarks. Its bidirectional attention made it ideal for understanding tasks (sentiment analysis, NER, question answering) rather than generation. Fine-tuning BERT on task-specific data became the standard NLP workflow from 2018-2021.
BERT spawned many variants: RoBERTa (improved training), ALBERT (smaller), DistilBERT (distilled), DeBERTa (disentangled attention). While GPT-style decoder-only models now dominate, BERT-family encoders remain widely deployed for efficient embedding and classification tasks.
Examples
- Google Search (uses BERT for query understanding)
- DistilBERT (lightweight NLP)
- SentenceTransformers (embeddings)
Related Terms
Transformer
A neural network architecture using self-attention to process sequences in parallel — the foundation of all modern LLMs.
Large Language Model (LLM)
A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.
GPT (Generative Pre-trained Transformer)
OpenAI's series of large decoder-only language models that established the paradigm for modern AI assistants.
Embedding
A dense numerical vector representation of data (text, images, audio) capturing semantic meaning.