1 min readJul 8, 2018
Hello,
I think you speak about this
self.loss = tf.reduce_mean(tf.square(self.target_Q - self.Q))
In fact remember that when we want to calculate the loss we need the real Q value but we don’t have it (because if we have real Q values we didn’t need a neural network ^^).
So we “bootstrap”
We consider that the real Q value = Reward of state and action + discount * Q value of next state
Hence Target_q = r(s,a) + gamma*maxQ(s’, a’)