The ep_rew_mean keeps decreasing

Hi everyone, I customize an enviorment,

Basically, the reward function is a weighted sum of 1. score 2. some soft constraints to avoid design violations 3. the number of violation change from previous state to current state, it seems the agent only learns to reduce the score:

But it looks like the agent only learns to decrease the score (as shown below)

https://preview.redd.it/0tbygps61cxc1.png?width=1746&format=png&auto=webp&s=cbdc97b3d0a014063f6a222a2a0fe313a6a797f7

for some reason, the ep_rew_mean keeps decreasing as shown below. If i do not misunderstand, ep_rew_mean is a mean of the cumulative reward for each epoch,

https://preview.redd.it/etzlprwr0cxc1.png?width=492&format=png&auto=webp&s=b389c8a36b5fe81d81a90f1780d246d5df6e83ee

The other training plots seem to be normal? Right?

https://preview.redd.it/5e0vf9bq0cxc1.png?width=1841&format=png&auto=webp&s=b7b3a712413618ed92e72d6a51d4683b03c2deef

Thank you everyone!

submitted by /u/Nice_Charge8971
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top
en_USEnglish