Safety & Ethics

Hallucination

When an AI model generates plausible-sounding but factually incorrect or fabricated information.

Definition

AI hallucination occurs when a language model generates confident, fluent text that is factually wrong, contradicts the source material, or is entirely made up. The model doesn't "know" it's wrong — it's predicting probable tokens, not retrieving verified facts.

Hallucination arises from the statistical nature of LLM training: models learn to produce text that looks like training data, which may include plausible-sounding falsehoods. Rarer facts are more prone to hallucination; common facts are usually reliable.

Mitigation strategies include RAG (grounding responses in retrieved sources), citation requirements, self-consistency checking, and calibrated uncertainty. Hallucination rates vary widely by model and task: frontier models hallucinate less on common knowledge, more on niche facts, citations, and numerical calculations.

Examples

  • LLM fabricating a court case citation
  • AI describing a person who doesn't exist
  • Wrong statistics presented confidently