CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

Paper and Code:

CrossQ is a model-free off-policy method that exceeds current SOTA without doing extra gradient steps. It essentially:

takes SAC (a 6yo method) deletes Target Nets adds Batch Normalization

These simple edits are enough to surpass the strong performances of REDQ and DroQ, in only 5% as many gradient steps.

ICLR 2024 spotlight talk:

(first coauthor here, happy to help!)

submitted by /u/RoboticsLiker
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top