PPO implementation

Hello community,

I need your help to know if the implemented update function of PPO is correct or not. My scenario involves a graph-based state (dynamic size), and an actor that has a selector layer to select an option. Then, the corresponding option layer selects an action. I have three options and thus three option layers. Two option layers have discrete outputs and one option layer has a multi-heads outputs. Could you help me in correcting this.

The select_action and update functions are as follows:

Snapshot of the actor network

Select_action function

Update function – Part 1

Update function – Part 2

Update function – Part 3

submitted by /u/GuavaAgreeable208
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top