What is RAG (Retrieval-Augmented Generation)?

Techniques

RAG (Retrieval-Augmented Generation)

Grounding LLM responses by first retrieving relevant documents from a knowledge base before generating an answer.

Definition

RAG addresses the knowledge limitations of LLMs — their training data has a cutoff date, they hallucinate facts, and they can't know private data. RAG adds a retrieval step: given a user query, the system first searches a vector database for relevant document chunks, prepends them to the prompt as context, then generates a response grounded in the retrieved material.

The pipeline involves: (1) indexing — encoding documents as embeddings and storing in a vector DB (Pinecone, Weaviate, Chroma); (2) retrieval — encoding the query, finding nearest neighbours; (3) augmentation — inserting retrieved chunks into the prompt; (4) generation — LLM generates a response citing the retrieved context.

RAG significantly reduces hallucination for factual tasks and enables LLMs to work with up-to-date or proprietary data without retraining. It is now the standard architecture for enterprise AI assistants, customer service bots, and research tools.

Examples

Perplexity AI (search + generate)
GitHub Copilot (code context)
Enterprise chatbots on internal docs

Related Terms

Large Language Model (LLM)

A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.

Vector Database

A database optimised for storing and querying high-dimensional embedding vectors via approximate nearest-neighbour search.

Embedding

A dense numerical vector representation of data (text, images, audio) capturing semantic meaning.

Hallucination

When an AI model generates plausible-sounding but factually incorrect or fabricated information.

Prompt Engineering

The practice of crafting effective text inputs to guide LLMs toward desired outputs.

Explore

← All glossary terms AI concept guides AI timeline Browse companies