-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postprocess_variants: Found multiple file patterns in input filename space #818
Comments
I do not run it step by step. I run "run_deepvariant". This is my command:
I have now added the following command, which is a workaround for the problem ...
Eventually this workaround sets --infile to "./[email protected]" and --nonvariant_site_tfrecord_path to "./[email protected]" (see directory listing above). |
I could extract the three commands make_examples, call_variants and postprocess_variants from the output. Here it is:
And here are the two last commands with std out ...
|
@MiWitt , Given that you are using |
This can not be the point. I am working in a cluster environment using slurm and the dir "." is the job specific scratch dir, which is located at "/scratch/SlurmTMP/JobSpecificFolder" (${TMPDIR})
|
@MiWitt , Can you use |
Hi, do you have any updates on this issue? |
@MiWitt , I am closing the issue due to inactivity. Please feel free to reopen if you have any updates. |
@kishwarshafin still same problem |
something wrong in get_cvo_paths_and_first_record(), it cannot properly parse call_variants_output-00000-of-00001.tfrecord.gz |
@EgorGuga
|
@MiWitt, yes, thanks for that solution, I did a similar thing in the run_deepvariant script |
For anyone struggling with this error in nextflow I adapted the above answer into a strategy that works in a nextflow process. Because nextflow exits upon a non zero error, the 'if then' strategy in the previous solution doesn't work since the process will exit before the alternate attempt. Instead we can use the || construct to execute the deepvariant call and then if it chokes on the problem where it looks for call_variants_output.tfrecord.gz but there is only a call_variants_output-00000-of-00001.tfrecord.gz file then it executes the second postprocess_variants call instead. I don't know why the call_variants_output.tfrecord.gz file is being given a sharded name when the unsharded name path is hard-coded here: deepvariant/scripts/run_deepvariant.py Line 768 in 432d616
I assume a bug?
|
I would suggest running your command with Also please attach the log on where it is failing. |
@kishwarshafin Thanks,
After changing to v1.8.0 I ran the same code but got the problem discussed in this thread. The following code however allowed me to execute successfully. However after running on HG002,HG003,HG004 and benchmarking with hap.py I found that I had much worse performance than with v1.5.0.
I didn't want to run the hacky code above so I tried just running the deepvariant call without the --intermediate_results code on a hunch and it successfully ran and I had improved performance over v1.5.0.
Can you help me understand what the default intermediate results folder is if not explicitly defined using that parameter? I don't see the intermediate results anymore anywhere in the nextflow work folders. Are they simply not written or are they getting written to somewhere like /var? I just want to make sure that there isn't going to be issues if the same slurm node is executing multiple deepvariant jobs. Thanks. |
The default intermediate results dir is a temporary directory defined using
By default, DeepVariant will output to
They are being written to a temporary directory within
I'm not sure why you observe poor performance with the longer command you provided - you have a lot of options in there though where I think the defaults should be sufficient. My current guess would be if you are mounting the tmp directory within your work directory - then maybe you could have multiple processes writing to the same intermediate folder which in this case would be An option with singularity like |
Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.6.1/docs/FAQ.md:
Describe the issue:
The postprocess_variants step fails with following error message:
ValueError: ('Found multiple file patterns in input filename space: ', './call_variants_output.tfrecord.gz')
Setup
Steps to reproduce:
Traceback (most recent call last):
File "/tmp/Bazel.runfiles_t3t5ek8u/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1419, in
app.run(main)
File "/tmp/Bazel.runfiles_t3t5ek8u/runfiles/absl_py/absl/app.py", line 312, in run
_run_main(main, args)
File "/tmp/Bazel.runfiles_t3t5ek8u/runfiles/absl_py/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/tmp/Bazel.runfiles_t3t5ek8u/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1300, in main
sample_name = get_sample_name()
File "/tmp/Bazel.runfiles_t3t5ek8u/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1203, in get_sample_name
_, record = get_cvo_paths_and_first_record()
File "/tmp/Bazel.runfiles_t3t5ek8u/runfiles/com_google_deepvariant/deepvariant/postprocess_variants.py", line 1179, in get_cvo_paths_and_first_record
raise ValueError(
ValueError: ('Found multiple file patterns in input filename space: ', './call_variants_output.tfrecord.gz')
Does the quick start test work on your system?
Please test with https://github.com/google/deepvariant/blob/r1.6/docs/deepvariant-quick-start.md.
Is there any way to reproduce the issue by using the quick start?
???
Any additional context:
Yes. I can change the parameter "--infile" of the postprocess_variants.py call from "./call_variants_output.tfrecord.gz" to "./[email protected]" and it works. Anyway, the call of postprocess_variants.py is auto-generated by "/opt/deepvariant/bin/run_deepvariant". The error does not occur for every sample ...
directory content of intermediate_results_dir after the error occured:
call_variants.log
call_variants_output-00000-of-00001.tfrecord.gz
gvcf.tfrecord-00000-of-00008.gz
gvcf.tfrecord-00001-of-00008.gz
gvcf.tfrecord-00002-of-00008.gz
gvcf.tfrecord-00003-of-00008.gz
gvcf.tfrecord-00004-of-00008.gz
gvcf.tfrecord-00005-of-00008.gz
gvcf.tfrecord-00006-of-00008.gz
gvcf.tfrecord-00007-of-00008.gz
make_examples.log
make_examples.tfrecord-00000-of-00008.gz
make_examples.tfrecord-00000-of-00008.gz.example_info.json
make_examples.tfrecord-00001-of-00008.gz
make_examples.tfrecord-00001-of-00008.gz.example_info.json
make_examples.tfrecord-00002-of-00008.gz
make_examples.tfrecord-00002-of-00008.gz.example_info.json
make_examples.tfrecord-00003-of-00008.gz
make_examples.tfrecord-00003-of-00008.gz.example_info.json
make_examples.tfrecord-00004-of-00008.gz
make_examples.tfrecord-00004-of-00008.gz.example_info.json
make_examples.tfrecord-00005-of-00008.gz
make_examples.tfrecord-00005-of-00008.gz.example_info.json
make_examples.tfrecord-00006-of-00008.gz
make_examples.tfrecord-00006-of-00008.gz.example_info.json
make_examples.tfrecord-00007-of-00008.gz
make_examples.tfrecord-00007-of-00008.gz.example_info.json
postprocess_variants.log
The text was updated successfully, but these errors were encountered: