Thomas Simonini
1 min readDec 7, 2018

--

Hi Raj,

Q learning is a TD learning algorithm.

Policy Gradients is a episodic learning algorithm.

In fact in TD learning you evaluate at each timestep

In episodic learning you evaluate at the end of the episode the sequence of actions you took.

--

--

Thomas Simonini
Thomas Simonini

Written by Thomas Simonini

Developer Advocate 🥑 at Hugging Face 🤗| Founder Deep Reinforcement Learning class 📚 https://bit.ly/3QADz2Q |

Responses (1)