1 min readMay 30, 2018
Hi,
“ why values of reward are not increasing, as well as training loss is not decreasing in provided output of the NN training?”
Because the maximum reward our agent can get is 101 (killing the monster) and we loose -1 at each timestep (to push the agent to kill rapidly the monster).
You can see that the training loss is decreasing because our agent gets better and better at predicting the Q(s,a).
Remember that training a DQN has a lot and a lot of variability. We can improvments in DQN (dueling, prioritized replay etc) to improve the training.