Does Expected SARSA suffer from Maximization Bias?

Maximization Bias is Q-Learning comes from max operator in the update equation. Similarly SARSA (with epsilon greedy target) also selects next action and updates Q values based on max most of the time. However Expected SARSA does not have this max operation in the update rule. So under a non-greedy target policy, will expected SARSA have maximization bias?

submitted by /u/Then-Law2937
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top
en_USEnglish