Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSA Compression #73

Closed
2 of 10 tasks
lrauschning opened this issue Oct 18, 2023 · 7 comments
Closed
2 of 10 tasks

MSA Compression #73

lrauschning opened this issue Oct 18, 2023 · 7 comments
Assignees
Milestone

Comments

@lrauschning
Copy link
Contributor

lrauschning commented Oct 18, 2023

Tracking issue for the MSA compression in this pipeline.

Tasks

Preview Give feedback
@lrauschning
Copy link
Contributor Author

when compressing using a separate module, most compression tools will automatically delete the input file.
This behaviour makes sense, as otherwise there would be no space savings when keeping the uncompressed and compressed file simultaneously.
However, it breaks nextflows paradigm by having the running of one module affect the output of an earlier step, potentially breaking the resume support.
For this reason, the MSA modules should natively produce compressed output that is then uncompressed as needed.

@alessiovignoli
Copy link
Contributor

The compression step is actually happening on the alignment outputs files. So it happens before the evaluate. It is a skipable step. And if it is run the files are actually saved as compressed. It should create a copy during compression because the un-compress version is actually passed to the evaluation step as that can not read compressed files.

@alessiovignoli
Copy link
Contributor

From the documentations i could scavenge FAMSA seems the only one able to compress the outptut using a flag.

FAMSA: Option for compressing output aligment to gzip (-gz switch).

Might need to look at the help page for each program while implementing.

NB. could not find any documentation for MTMalign stand alone version.

@lrauschning
Copy link
Contributor Author

Yes, MTMalign has neither documentation nor a publicly accessible source code repository. On the plus side, at least it compiles on my machine ^^.
I think just calling pigz in the module after the call to the MSA tool is fine, the only annoying thing is that for the singularity & biocontainers backend that requires us to construct a joint container of the MSA tool and pigz, unless anyone knows a way to use docker compose in nfcore modules.

@luisas
Copy link
Collaborator

luisas commented Oct 24, 2023

I would not overcomplicate things now.
If the tools do not provide the compression, maybe we can just have the zip module as we currently have. At least for the first release and then see who uses the pipeline and if this is even necessary.

In any case, multipackage containers are doable https://github.com/BioContainers/multi-package-containers

But for now, I would build a clean and functional pipeline first. Then we can polish these things once we have everything else in place :)

@lrauschning
Copy link
Contributor Author

Branch where I am updating the modules to produce compressed output:
https://github.com/lrauschning/modules/tree/msa-compression
Made a pull request to create mulled containers for some tools:
BioContainers/multi-package-containers#2867
Will do PR to nfcore once they are merged.

@luisas luisas added this to the First Release milestone Dec 14, 2023
@luisas
Copy link
Collaborator

luisas commented May 21, 2024

Closed by #124

@luisas luisas closed this as completed May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: No status
Development

No branches or pull requests

3 participants