Input function for NN in PPO, Mlp policy

I have a custom environment with PPO from SB3 and using the Mlp policy but with512 neurons per layer and 8 layers. I have the following observation space:

min_obs = np.array([[-np.inf, -np.inf, -2.5, -2.5]] * len(self.agents), dtype=np.float32)

max_obs = np.array([[np.inf, np.inf, 2.5, 2.5]] * len(self.agents), dtype=np.float32)

self.observation_space = spaces.Box(low=min_obs, high=max_obs, dtype=np.float32)

I know that the observations for all agents are concatenated and passed to the input layer but how? My observation can vary as the number of agents vary whose information is included in observations. What is the mathematical function that is used and how are they converted to NN input for the two observation neurons?

Code: https://drive.google.com/file/d/1zhmLnigj_BqNOxuSri0Va0fMRRJs8slW

submitted by /u/Hooooman101
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top
en_USEnglish