Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce fig. A.1 #8

Open
Cmeo97 opened this issue Jun 7, 2023 · 4 comments
Open

How to reproduce fig. A.1 #8

Cmeo97 opened this issue Jun 7, 2023 · 4 comments

Comments

@Cmeo97
Copy link

Cmeo97 commented Jun 7, 2023

Hi Danijar,

Reading the appendix of Director I couldn't understand what you mean by providing the reward to the worker. Is there a config I can use to do that? In the description of the figure you write: " When additionally providing task reward to the worker", does it mean that you change the context variable defined in hierarchy.py and include the reward as well? Also, if it works so well, why don't you do that by default? Have you tried to do the same for other tasks as well (i.e. Ant Mazes)?

Thank you so much!

Bests,
Cristian

@jdubkim
Copy link

jdubkim commented Jun 16, 2023

It is set by default. You can see the config in this line of configs.yaml. worker_rews: {extr: 1.0, expl: 0.0, goal: 1.0}

@Cmeo97
Copy link
Author

Cmeo97 commented Jun 16, 2023

I’m not sure if I got it correctly. The default config file has worker_rews: {extr: 0.0, expl: 0.0, goal: 1.0}. Besides, by using this line, I think what would change is that the worker would have an additional critic, but I still don’t get how the task reward would be provided to the worker. Also, if it works better for most of tasks that don’t have sparse reward profiles, why don’t use this config in general? Does it lead to worst performances when it comes to envs like Ant Maze?

thank you so much!

@jdubkim
Copy link

jdubkim commented Jun 16, 2023

Oh my bad. I think I have changed it in my local repo. If you change extr to non-zero, then I think the worker takes the extrinsic reward from the world model? It's in hierarchy.py. Also, the reason why the default is set to 0 is due to the intention of the design I guess. In section 2.4 in the paper, it says "We make this design choice to demonstrate that the interplay between the manager and the worker is successful acorss many environments, although we also include ...". If worker takes task reward, it can be thought of as cheating because ideally the worker policy should be rewarded based on if it reached the goal or not.

@Cmeo97
Copy link
Author

Cmeo97 commented Jun 28, 2023

Thank you so much for your answer! Could you expand a bit on the reason why it would be like cheating? Why can't we give to both manager and worker access to the extrinsic rewards? Conceptually speaking, although the manager is supposed to have a higher level perspective, and therefore should be able to access the extrinsic rewards, I can't see why the worker shouldn't as well. Could reference me some paper where such thing is mentioned, or expand on this? Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants