Techniques

Reinforcement Learning (RL)

A learning paradigm where an agent learns by taking actions in an environment and receiving reward signals.

Definition

Reinforcement learning is a type of machine learning where an agent interacts with an environment to maximise cumulative reward. Unlike supervised learning, RL receives no labelled examples — it must discover successful strategies through trial and error.

The core formalism is a Markov Decision Process: at each step, the agent observes state s, takes action a, receives reward r, and transitions to a new state. The goal is to learn a policy π(s→a) that maximises expected future reward. Key algorithms include Q-learning (value-based), REINFORCE and PPO (policy gradient), and model-based approaches.

RL achieved landmark results in AlphaGo (defeating human Go champions) and OpenAI Five (Dota 2). More recently, RLHF (Reinforcement Learning from Human Feedback) became the key alignment technique for LLMs like ChatGPT.

Examples

  • AlphaGo
  • OpenAI Five
  • ChatGPT (via RLHF)
  • Autonomous vehicle training

Want a deeper dive?

Read our full explainer with use cases, how-it-works, and FAQs.

Reinforcement Learning concept guide