Hello,

1 min readJul 8, 2018

Hello,

I think you speak about this

self.loss = tf.reduce_mean(tf.square(self.target_Q - self.Q))

In fact remember that when we want to calculate the loss we need the real Q value but we don’t have it (because if we have real Q values we didn’t need a neural network ^^).

So we “bootstrap”

We consider that the real Q value = Reward of state and action + discount * Q value of next state

Hence Target_q = r(s,a) + gamma*maxQ(s’, a’)

Written by Thomas Simonini

No responses yet