What is Reinforcement Learning (RL)?

Techniques

Reinforcement Learning (RL)

A learning paradigm where an agent learns by taking actions in an environment and receiving reward signals.

Definition

Reinforcement learning is a type of machine learning where an agent interacts with an environment to maximise cumulative reward. Unlike supervised learning, RL receives no labelled examples — it must discover successful strategies through trial and error.

The core formalism is a Markov Decision Process: at each step, the agent observes state s, takes action a, receives reward r, and transitions to a new state. The goal is to learn a policy π(s→a) that maximises expected future reward. Key algorithms include Q-learning (value-based), REINFORCE and PPO (policy gradient), and model-based approaches.

RL achieved landmark results in AlphaGo (defeating human Go champions) and OpenAI Five (Dota 2). More recently, RLHF (Reinforcement Learning from Human Feedback) became the key alignment technique for LLMs like ChatGPT.

Examples

AlphaGo
OpenAI Five
ChatGPT (via RLHF)
Autonomous vehicle training

Want a deeper dive?

Read our full explainer with use cases, how-it-works, and FAQs.

Reinforcement Learning concept guide

Related Terms

RLHF (Reinforcement Learning from Human Feedback)

A training technique where human preference ratings guide language model fine-tuning to produce more helpful, harmless outputs.

Machine Learning (ML)

A subset of AI where systems learn patterns from data rather than following explicitly programmed rules.

AI Agents

Autonomous AI systems that perceive their environment, plan multi-step actions, and use tools to complete tasks.

Explore

← All glossary terms AI concept guides AI timeline Browse companies