Stable baseline PPO isn’t learning


I’m new to the reinforcement learning community and I tried to make my first model using a custom environment. The concept is quite simple : a car that musts stay in a racing track and at the middle of the road. It gets rewarded its speed and its reaction regarding its position from distances to the walls at its sides and gets punished if the difference of theses distances is too high. The problem is that it seems to take random decisions when I test it after training. I use python, openai gym and stable_baselines3’s PPO.

Here are the 2 codes :

trainer : pastebin

tester : pastebin

Any help would be appreciated and please excuse me for my bad english.

submitted by /u/PokPok3515
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top