Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

final kimura calculation missing for LINEs #173

Open
hy09 opened this issue Jan 8, 2025 · 1 comment
Open

final kimura calculation missing for LINEs #173

hy09 opened this issue Jan 8, 2025 · 1 comment

Comments

@hy09
Copy link

hy09 commented Jan 8, 2025

Hi Toby,
Thanks for adding the re-calculation of Kimura distance in the final gff file. However, I've noticed the values are missing (KIMURA80=nan) for a significant number of LINEs and LTRs. This is due to a discrepancy in repeat IDs. In RepeatMasker's .out file, strings "_3end", "_5end", or "_orf2" are removed after adjusting the positions. Thus the corresponding consensus sequence cannot be found and no error message is produced.

Here is an example line in the final .gff file:

1	Earl_Grey	LINE/L1	3054064	3054829	1592	-	.	TSTART=5335;TEND=6124;ID=L1MB4;SHORTTE=F;KIMURA80=nan

Corresponding repeat IDs in dfam database:

>L1MB4_3end#LINE/L1 @Eutheria [S:45,55]
>L1MB4_5end#LINE/L1 @Eutheria [S:55]

With LTRs, some of the repeatMasker outputs have "-INT" added and that caused the same problem.

Any comments would be appreciated. Thanks!

@TobyBaril
Copy link
Owner

This is due to using pre-existing libraries in addition to the de novo pipeline. @jamesdgalbraith we should be able to work something out for this I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants