What is Diffusion Model?

Models

Diffusion Model

A generative model that learns to create data by reversing a gradual noise-addition process.

Definition

Diffusion models are a class of generative models that learn to reverse a forward noising process. During training, real data is progressively corrupted with Gaussian noise over many steps until it becomes pure noise; the model learns to reverse this process — predicting and removing noise at each step.

At inference, the model starts with random noise and iteratively denoises it, guided by a text prompt (via cross-attention to a text encoder like CLIP). This produces highly coherent images, audio, or video matching the prompt. Stable Diffusion, DALL-E 3, and Midjourney all use diffusion models.

Diffusion models have largely replaced GANs for image generation due to higher quality, training stability, and better text-image alignment. They now extend to video (Sora), audio (AudioLDM), and 3D generation.