Questions about how to train pickplacev2 task in Metaworld.


Has anyone successfully trained a Soft Actor-Critic (SAC) agent on the pickplacev2 task in MetaWorld? I have attempted to train the agent using my own implementation as well as using libraries like Stable Baselines3, but the agent fails to learn effectively, even struggling to reach the object. My code seems correct since it works well with other tasks such as reachv2, windowclosev2, and several others.

Interestingly, I have successfully trained a DDPG agent on this task by employing specific techniques. I think the failure is caused by two reasons. First is that the reward function of pickplacev2 is not designed. The reward signal for the reach stage in pickplacev2 is very weak, often less than 0.03 at the initial position, which might contribute to the difficulty. This is perplexing because the paper suggests an improved reward design. Second, the exploration seems important to the this task. For DDPG, modifying the noise scale was crucial, indicating that exploration is significant for this task

If you have achieved success, could you share insights or tips on specific configurations, hyperparameters, or modifications that facilitated successful training?

submitted by /u/DF_13
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top