How to reproduce fig. A.1 #8

Cmeo97 · 2023-06-07T14:07:18Z

Hi Danijar,

Reading the appendix of Director I couldn't understand what you mean by providing the reward to the worker. Is there a config I can use to do that? In the description of the figure you write: " When additionally providing task reward to the worker", does it mean that you change the context variable defined in hierarchy.py and include the reward as well? Also, if it works so well, why don't you do that by default? Have you tried to do the same for other tasks as well (i.e. Ant Mazes)?

Thank you so much!

Bests,
Cristian

jdubkim · 2023-06-16T13:20:57Z

It is set by default. You can see the config in this line of configs.yaml. worker_rews: {extr: 1.0, expl: 0.0, goal: 1.0}

Cmeo97 · 2023-06-16T13:37:29Z

I’m not sure if I got it correctly. The default config file has worker_rews: {extr: 0.0, expl: 0.0, goal: 1.0}. Besides, by using this line, I think what would change is that the worker would have an additional critic, but I still don’t get how the task reward would be provided to the worker. Also, if it works better for most of tasks that don’t have sparse reward profiles, why don’t use this config in general? Does it lead to worst performances when it comes to envs like Ant Maze?

thank you so much!

jdubkim · 2023-06-16T14:02:53Z

Oh my bad. I think I have changed it in my local repo. If you change extr to non-zero, then I think the worker takes the extrinsic reward from the world model? It's in hierarchy.py. Also, the reason why the default is set to 0 is due to the intention of the design I guess. In section 2.4 in the paper, it says "We make this design choice to demonstrate that the interplay between the manager and the worker is successful acorss many environments, although we also include ...". If worker takes task reward, it can be thought of as cheating because ideally the worker policy should be rewarded based on if it reached the goal or not.

Cmeo97 · 2023-06-28T09:34:39Z

Thank you so much for your answer! Could you expand a bit on the reason why it would be like cheating? Why can't we give to both manager and worker access to the extrinsic rewards? Conceptually speaking, although the manager is supposed to have a higher level perspective, and therefore should be able to access the extrinsic rewards, I can't see why the worker shouldn't as well. Could reference me some paper where such thing is mentioned, or expand on this? Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce fig. A.1 #8

How to reproduce fig. A.1 #8

Cmeo97 commented Jun 7, 2023

jdubkim commented Jun 16, 2023

Cmeo97 commented Jun 16, 2023

jdubkim commented Jun 16, 2023

Cmeo97 commented Jun 28, 2023

How to reproduce fig. A.1 #8

How to reproduce fig. A.1 #8

Comments

Cmeo97 commented Jun 7, 2023

jdubkim commented Jun 16, 2023

Cmeo97 commented Jun 16, 2023

jdubkim commented Jun 16, 2023

Cmeo97 commented Jun 28, 2023