SAC + HER can’t exceed success rate around 0.8

Dear all, I work on the algorithm for autonomous navigation for small unmanned vessels. I use sb3 SAC + HER buffer for training. The rules of the env are quite simple.

The ship after the reset is in the middle of the observation space [0.5, 0,5]. Then the destination is randomly chosen on the space [0-1,0-1]. The reward is calculated as -distance from the current position and target (euclidean). Success is defined as the ship being close to the target point (radius of 0.05 from the target). The done is when the ship gets to the target or hits the wall (and gets the negative reward of -1). The action space is spaces.Box(low=np.array([-1, 0]), high=np.array([1, 1]) and it maps to change of the heading change and speed as action = np.array([action[0]*10, action[1] * 10], dtype=np.float32). The model is defined as: model = SAC(“MultiInputPolicy”, env, buffer_size=buffer_size, replay_buffer_class=HerReplayBuffer, replay_buffer_kwargs=dict(
copy_info_dict=True ),

I tried to optimize the hyperparameters in optuna library with these values:

buffer_size = trial.suggest_categorical(‘buffer_size’, [100000])

batch_size = trial.suggest_categorical(‘batch_size’, [64,128])

gamma = trial.suggest_loguniform(‘gamma’, 0.95, 0.99)

learning_rate = trial.suggest_loguniform(‘learning_rate’, 6.7e-4, 8.5e-4)

net_arch = trial.suggest_categorical(‘net_arch’, [[1024, 1024, 1024],[2048,2048,2048],[1024,1024,1024,1024]])

After several trials, I can not exceed the success rate of estimately 0.85. The env is simple and I did it according to The motion model in my env is trivial. Please give me some advice as I stuck on this for several weeks. Thanks!!!

submitted by /u/Sharp-Record1600
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top