Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on Ant-v2 expertd data and Humanoid-v2 random seed Experiments #7

Open
XizoB opened this issue Sep 22, 2022 · 1 comment
Open

Comments

@XizoB
Copy link

XizoB commented Sep 22, 2022

Hi~Thank you very much for sharing your paper and source code !!! I am new to inverse RL and I want to implement your method on the robot recently.
About Ant-v2

  1. And I found that the reward for each step in your Ant-v2 expert data is 1. Why set the reward like this? And how to run sqil correctly in your code

About random seeds

  1. I found that the results with different random seeds in the humanoid experiments are very different, some results are around 1500 points, is it because the number of learning steps is only 50000 or the expert data is 1?

I runned with this python train_iq.py env=humanoid agent=sac expert.demos=1 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0/1/2/3/4/5 agent.init_temp=1
seed
Your work is very valuable and I look forward to your help in solving my doubts.

@XizoB XizoB changed the title Issue on Ant-v2 and Humanoid-v2 random seed Experiments Issue on Ant-v2 expertd data and Humanoid-v2 random seed Experiments Sep 22, 2022
@div-garg
Copy link

div-garg commented Nov 1, 2022

Hi, we only the expert_rewards for SQIL where the expert gets a reward 1 and the policy gets a reward 0. Storing fake rewards of 1 for the expert data makes this easy to implement. Nevertheless, for IQ-Learn we don't use expert rewards and this field is never used.

The stochasticity you observe is likely because of using only 1 expert demo to train on, leading to high variance on the seeds. Trying to reduce the temperature to maybe 0.5 could help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants