Doubt about Policy Gradient with Gaussian policy.


Im watching the Sergey Levine course available on youtube.

I am now with Policy Gradients until now zero problems but some mathematical expression is confusing me.

In this vid at 3:58

He have used the following eqs:

I do not understand why the log probability is equal to this distance and how to compute the gradient of this.


submitted by /u/RikoteMasterrrr
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top