Visual Representation for SAC estimated Probability Density Function after applying tanh to the normal sample.

Desmos PDF graph

I created this because I didn’t understand how to span my std (standard deviation) from the policy output. I heard that clipping the value in the interval (0, 1] is good, but there is some gradient loss when simply clamping, the policy can start outputing bellow 0 and never recover. So I tried using the sigmoid as the activation for the std, but after a while it bricked the policy by making the entropy stuck at 1.

After looking at the PDF I think I’ll try to use sigmoid * 0.5. But maybe I should just change the alpha from αH(a(s)) dynamically, but idk. If you had trouble with entropy regularization as well and dynamically changing the alpha helped you please tell me.Anyway, I did not find this anywhere else on the internet. Looks helpful.

submitted by /u/O_CLIPE
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top