TD Error assignment in Double DQN with Prioritized Experience Replay

How would this work? The TD Error of an observation for each network would be different. Should I have two TD Error values for each observation and use the one for the current network, or use the minimum TD Error value and hope that the more accurate network eventually “teaches” the other?

submitted by /u/AUser213
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top