DQN with Prioritized Experience Replay sudden drop in performance

I’ve been trying to implement a DQN for the cart pole environment, but for some reason, I keep seeing these sudden drops in performance. I assumed that adding PER would be the solution but it doesn’t seem to have helped with the issue.

The replay buffer has a size of 1M, each game has a max length of 1,000 steps, after 50 games the model samples 50,000 observations. The reward is 5 x height^2, discount rate of 0.98, learning rate 0.001, and I clip the gradient norm to 1.

Any help/comments would be appreciated.




