Techniques

Embedding

A dense numerical vector representation of data (text, images, audio) capturing semantic meaning.

Definition

An embedding is a mapping from a high-dimensional, discrete space (words, pixels, users) to a dense, continuous vector space where similar items are near each other. Neural networks learn embeddings by training on tasks that require understanding similarity or relationships in the data.

Word embeddings (Word2Vec, GloVe) capture semantic relationships: "king" - "man" + "woman" ≈ "queen". Modern contextual embeddings from transformers (BERT, OpenAI text-embedding-ada) produce different vectors for the same word in different contexts.

Embeddings are the backbone of RAG systems, semantic search, recommendation engines, and clustering. Vector databases like Pinecone and Weaviate store embeddings and enable approximate nearest-neighbour search at scale. Multimodal embeddings (CLIP) map text and images into a shared space.

Examples

  • Word2Vec
  • OpenAI text-embedding-ada-002
  • CLIP (text + image embeddings)
  • Cohere Embed