1 min readDec 7, 2018
Hi Raj,
Q learning is a TD learning algorithm.
Policy Gradients is a episodic learning algorithm.
In fact in TD learning you evaluate at each timestep
In episodic learning you evaluate at the end of the episode the sequence of actions you took.