DQN with Prioritized Experience Replay sudden drop in performance

I’ve been trying to implement a DQN for the cart pole environment, but for some reason, I keep seeing these sudden drops in performance. I assumed that adding PER would be the solution but it doesn’t seem to have helped with the issue.

The replay buffer has a size of 1M, each game has a max length of 1,000 steps, after 50 games the model samples 50,000 observations. The reward is 5 x height^2, discount rate of 0.98, learning rate 0.001, and I clip the gradient norm to 1.

Any help/comments would be appreciated.

https://preview.redd.it/wt5epnk0fhwc1.png?width=895&format=png&auto=webp&s=d4317532b76418837eb12b28fb72570331900ad7

https://preview.redd.it/rz1a9jv1fhwc1.png?width=870&format=png&auto=webp&s=30efb5f8f9be21254902340ce85b3137b1c842bf

https://preview.redd.it/xi269203fhwc1.png?width=906&format=png&auto=webp&s=870046059e4aeeaf3855c83f089b8fde859183b6

submitted by /u/AUser213
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top
en_USEnglish