You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently conducting Whole Genome Bisulfite Sequencing (WGBS) data analysis using Bismark and plan to utilize a soft-masked genome, where all repetitive and low-complexity regions are marked with lowercase letters.
During the index generation step, I observed that the index created is consistent with the unmasked genome. However, I noticed a significant difference in the results during the alignment step, specifically in the number of uniquely aligned reads. It appears that tools like Bowtie2 ignore the soft-masking, treating the lowercase letters as uppercase during alignment.
Is there a specific parameter or approach in Bismark that would allow me to achieve alignment results with the soft-masked genome that are comparable to those obtained with the unmasked genome? Any guidance or advice would be greatly appreciated!
Thank you!
The text was updated successfully, but these errors were encountered:
To be perfectly honest, I don't exactly know whether or not Bowtie2 treats soft-masked genomes differently to unmasked genomes but I don't think it does (Google also doesn't seem to know, "how does Bowtie2 treat soft-masked index" didn't yield any great insights either).
What would you like to achieve by soft-masking repeats?
I'm sorry, I may not have expressed myself clearly. What I actually want to know is how to ensure consistent detection rates when using unmasked and soft-masked genomes in Bismark. The reason is that we have utilized soft-masked genomes in other omics analyses, so we hope to maintain consistency. However, we compared unmasked and soft-masked genomes in WGBS data analysis with bismark, and even though the generated indexes are the same, there are still differences in the subsequent methylation detection rates.
I am afraid I don't really have an answer you, I would have to do some tests with reproducible examples. I think your best bet would be to cross-post this questions over at the Bowtie2 repo - as the effects of this behaviour will likely be part of the Bowtie2 strategy with soft-masked indexes. I you get an answer, I'd be curious to learn more details. Sorry if this is not immediately useful.
I am currently conducting Whole Genome Bisulfite Sequencing (WGBS) data analysis using Bismark and plan to utilize a soft-masked genome, where all repetitive and low-complexity regions are marked with lowercase letters.
During the index generation step, I observed that the index created is consistent with the unmasked genome. However, I noticed a significant difference in the results during the alignment step, specifically in the number of uniquely aligned reads. It appears that tools like Bowtie2 ignore the soft-masking, treating the lowercase letters as uppercase during alignment.
Is there a specific parameter or approach in Bismark that would allow me to achieve alignment results with the soft-masked genome that are comparable to those obtained with the unmasked genome? Any guidance or advice would be greatly appreciated!
Thank you!
The text was updated successfully, but these errors were encountered: