Doubt about Policy Gradient with Gaussian policy.

Hi,

Im watching the Sergey Levine course available on youtube.

I am now with Policy Gradients until now zero problems but some mathematical expression is confusing me.

In this vid https://youtu.be/VSPYKXm_hMA?si=WIdrh41TX8RHXYu3&t=238 at 3:58

He have used the following eqs:

https://preview.redd.it/dl9qxk3vj8zc1.png?width=426&format=png&auto=webp&s=b6f322d291b88dd3627cd16812bb0532fa684fcb

I do not understand why the log probability is equal to this distance and how to compute the gradient of this.

Thx.

submitted by /u/RikoteMasterrrr
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top
en_USEnglish