Is muzero insanely sensitive to hyperparameters?

I have be trying to replicate muzero results using various opensource implementations for more than 50 hours. I tried pretty much every implementation i have been able to find and run. Of all those implementations i managed to see muzero converge once to find a strategy to walk a 5×5 grid. After that run i have not been able to replicate it. I have not managed to make it learn to play tic tac with the objective of drawing the game on any publicly available implementation. The best i managed to get was a success rate of 50%. I fidgeted with every parameter i have been able but it pretty much yielded no result.

Am i missing something? Is muzero incredibly sensitive to hyperparameters? Is there some secrete knowledge that is not explicit in papers or implementations to make it work?

submitted by /u/drblallo
[link] [comments]

Leave a Reply

The Future Is A.I. !
To top
en_USEnglish