Thomas Simonini
1 min readMay 30, 2018

--

Hi,

“ why values of reward are not increasing, as well as training loss is not decreasing in provided output of the NN training?”

Because the maximum reward our agent can get is 101 (killing the monster) and we loose -1 at each timestep (to push the agent to kill rapidly the monster).

You can see that the training loss is decreasing because our agent gets better and better at predicting the Q(s,a).

Remember that training a DQN has a lot and a lot of variability. We can improvments in DQN (dueling, prioritized replay etc) to improve the training.

--

--

Thomas Simonini
Thomas Simonini

Written by Thomas Simonini

Developer Advocate 🥑 at Hugging Face 🤗| Founder Deep Reinforcement Learning class 📚 https://bit.ly/3QADz2Q |

No responses yet