CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

Paper and Code: http://adityab.github.io/CrossQ

CrossQ is a model-free off-policy method that exceeds current SOTA without doing extra gradient steps. It essentially:

takes SAC (a 6yo method) deletes Target Nets adds Batch Normalization

These simple edits are enough to surpass the strong performances of REDQ and DroQ, in only 5% as many gradient steps.

Twitter: https://twitter.com/aditya_bhatt/status/1768342823747674377
ICLR 2024 spotlight talk: https://iclr.cc/virtual/2024/poster/18699

(first coauthor here, happy to help!)

https://preview.redd.it/0bhb82z77mzc1.png?width=3582&format=png&auto=webp&s=41f3071cb84445734a6338e672658c99044ea560

submitted by /u/RoboticsLiker
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top
en_USEnglish