Introducing Q-Learning

What is Q-Learning?

Given a state and action, our Q Function outputs a state-action value (also called Q-value)
Given a state and action pair, our Q-function will search inside its Q-table to output the state-action pair value (the Q value).
We see here that with the training, our Q-Table is better since thanks to it we can know the value of each state-action pair.

The Q-Learning algorithm

Off-policy vs On-policy

Acting policy
Updating policy

An example

Let’s train our Q-Learning Taxi agent 🚕

Why we set a -1 for each action?

