Thomas Simonini
1 min readJul 8, 2018

--

Hello,

I think you speak about this

self.loss = tf.reduce_mean(tf.square(self.target_Q - self.Q))

In fact remember that when we want to calculate the loss we need the real Q value but we don’t have it (because if we have real Q values we didn’t need a neural network ^^).

So we “bootstrap”

We consider that the real Q value = Reward of state and action + discount * Q value of next state

Hence Target_q = r(s,a) + gamma*maxQ(s’, a’)

--

--

Thomas Simonini
Thomas Simonini

Written by Thomas Simonini

Developer Advocate 🥑 at Hugging Face 🤗| Founder Deep Reinforcement Learning class 📚 https://bit.ly/3QADz2Q |

No responses yet