How to access policy state with good train results? #711
-
Hi everyone, I implemented a custom environment (trading). When the episode is done, it prints out the sum of the rewards an some info. During training I sometimes see some pretty good rewards in this results. But even than it is not stored (save_best is not called). And unfortunately even when stored and loaded back afterwards the results never reach those good episodes. I'm pretty sure it is something that I'm missing and not understand here and not something technical or a bug and hope you can give me some advise.
Is best_reward always the best reward from a test? tianshou/tianshou/trainer/base.py Line 348 in f270e88 My test and train envs are identical and I'm using DQN (offpolicy) I was asuming that train and test should result in same result(which is obviously wrong). Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
DQN's performance is largely affected by eps greedy.
|
Beta Was this translation helpful? Give feedback.
DQN's performance is largely affected by eps greedy.
eps_test
andeps_train
are set to different values, so that's the reason for different performance between train and test.best_reward
always comes from a test. But if you are curious about "some pretty good result" in training, you can settest_in_train=True
in the offpolicy trainer. This will freeze the policy, calltest_episode
to evaluate the policy once if it has an episodic reward that is above the given threshold.