Hi Raj,

Hi Thomas,
2
1
Raj Miglani
Thomas Simonini
·Follow
1 min read·
Dec 7, 2018
--
Hi Raj,
Q learning is a TD learning algorithm.
Policy Gradients is a episodic learning algorithm.
In fact in TD learning you evaluate at each timestep
In episodic learning you evaluate at the end of the episode the sequence of actions you took.
--
--
Written by Thomas Simonini5.3K Followers
·221 Following
Developer Advocate 🥑 at Hugging Face 🤗| Founder Deep Reinforcement Learning class 📚 https://bit.ly/3QADz2Q |
Responses (1)
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams