Wrong deletion masking for AAV task? #18

dlnp2 · 2022-08-23T08:25:59Z

@sacdallago hi, thank you very much for your great data curation. I am planning to use the AAV dataset for my research.

I found that some deletion masks may not have been properly applied to the wild type sequences: as the image below shows, there are 29 sequences with different mutation_mask but with the same full_aa_sequnece as the wild type. Is this intended result?

Below is the code for replication:

import pandas as pd
from Bio import SeqIO
wt_seq = str(next(SeqIO.parse("P03135.fasta", "fasta")).seq)
variant_effects = pd.read_csv("full_data.csv")
wild_types = variant_effects.loc[variant_effects["full_aa_sequence"] == wt_seq]
wild_types

The text was updated successfully, but these errors were encountered:

alex-hh · 2023-09-08T12:47:03Z

I believe these may be sequences containing stop codons, which are sometimes represented with '*' (and is implied by these sequences having the value 'stop' in the category column). There are a few extra variants containing stop codons that end up with different sequences to those above due to also containing other mutations. If that's right then I think (i) all such variants should be excluded from all splits, since models do not encode the stop codon so cannot predict the fitnesses of these sequences (ii) the README file https://github.com/J-SNACKKB/FLIP/tree/main/splits/aav should be corrected to say that "*" in mutation mask and mutated region means stop codon and not deletion.

To identify all such rows:

import pandas as pd

variant_effects = pd.read_csv("full_data.csv")
stop_variants = variant_effects[variant_effects["category"]=="stop"]

This is equivalent to selecting all variants in which the mutation mask contains "*":

stop_variants = variant_effects[variant_effects["mutation_mask"].apply(lambda x: "*" in x)]

Some of these sequences contain stop codons which are effectively 'insertions' and some contain stop codons which are 'substitutions'. The two cases aren't distinguished by mutation_mask.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong deletion masking for AAV task? #18

Wrong deletion masking for AAV task? #18

dlnp2 commented Aug 23, 2022 •

edited

Loading

alex-hh commented Sep 8, 2023 •

edited

Loading

Wrong deletion masking for AAV task? #18

Wrong deletion masking for AAV task? #18

Comments

dlnp2 commented Aug 23, 2022 • edited Loading

alex-hh commented Sep 8, 2023 • edited Loading

dlnp2 commented Aug 23, 2022 •

edited

Loading

alex-hh commented Sep 8, 2023 •

edited

Loading