Questions about PPO for duelling agents

I’m writing an AI which is composed of 2 different players playing a game with different dimensions of environment. So, I need to know if what I’m doing id theoretically correct.

I use an A2C nn for the agents and I renormalise the probabilities after each move reduce the space of action of the agent. I collect one rollout (action, state, reward, done) for each player considering only the state and action of that player. Then I set and horizon of 128 moves for each player and calculate GAE, then renormalise GAE and train the nn every time the Horizon is reached I train the nn. Is this methodologically correct? Can I switch the environment increasing the dimension of it during training? Am I missing something?

submitted by /u/Capitain-Nemo-9294
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top