Looking for citations for tensorboard log parameters and their meanings

I am writing my thesis and would like to add a chapter on how I will be judging the performance of the learned policy. Mainly I am looking at rewards, entropy coefficient and losses. Even though it’s straight forward, is there any paper or official source that I could cite for this.

