I’ve been working with a DQN model and have noticed some inconsistencies that I’m hoping to get some insights on.

During the training phase, I’ve been using the dqn_training function and loading weights. However, the actions it takes during testing seem to be significantly different from those during training. Despite the rewards decreasing, it continues to perform the same action ( the end goal is to have a higher reward. This was the case during training)

I’m particularly interested in understanding why there’s a discrepancy between the actions taken in training vs testing and whether dqn.test is even the right and if dqn.forward is a better function to use

Any insights or suggestions would be greatly appreciated.

