-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
amplicon for Targeted Sequencing #475
Comments
Hi @tangxj98, The amplicon sequence (CTGGCTCCTTCTGTTGTTTCTCTTGGCTCCAGGACCCCCGCAGCAAACACAAGTTTAAGATCCACACGTACTCCAGCCCCAC) is the sequence of the first region in your region file (focused.region.txt) as contained in your genome file (GCA_000001405.15_GRCh38_no_alt_short_headers_nonACTG_to_N.fa). What is the design of your targeted sequencing experiment? Do you have two primers outside of the guides amplifying the entire interior region? If you are expecting deletion of the interior gene, the short PCR products will be artificially inflated compared to the longer non-deletion amplicons. I would suggest designing multiple primers to generate amplicons of approximately the same size like this:
This strategy would allow you to measure small indels (P1/P2 and P3/P4) as well as the long deletion (P1/P4). |
Hi @kclem , Thanks for the reply. The design of the targeted sequencing actually contains tiles of primers that covers the whole genes. Below is how the customized targeted sequencing panel look like in genome browser. The focused.region.txt I used was a bed file containing two 100bp regions spanning over the two double strand break points of the sgRNAs. Is this the right way to make the focused.region.txt? Sorry I am a newbie with CRISPR. I sincerely appreciate your suggestion. |
By specifying two 100bp regions, you'll be able to estimate the indel rate at the two sites, but you won't be able to measure the rate at which the interior gene is deleted (because the reads from cells where the gene is deleted won't align to your genome). Note that reads must cover the entire 100bp region to be included in the analysis, so I would also suggest shrinking the region to ~30bp depending on your read length. Again, the main problem with using WGS for this analysis is that the reads that support the large gene deletion aren't aligned to the genome reference and won't be considered for analysis. If you want to quantify indels at each target site as well as the large deletion, I would suggest creating 3 reference sequences: 1) the 100bp reference at the left cut site, 2) 100bp reference at the right cut site and 3) reference where the interior gene is deleted (probably 50bp from each arm, so 100bp in total). You can use CRISPRessoPooled to align reads to these references and analyze cutting frequencies. I hope that helps! |
Hi @kclem, I used the CRISPRessoPooled to run the job. However, it is very weird that the intermediate bam had no alignment reads. The AMPLICON file: What makes it weird is that, I can grep the whole "TGGAGTACGTGTGGATCTTAAACTTGTGTTTGCTGCGGGGGTCCTGGAGCCAAGAG" from the intermediate out.extendedFrags.fastq.gz and found many exact matches. How came the bowtie alignment turned out having nothing mapped? Could you please provide some insights here? Or shall I use a mixed or genome mode to run it? Many thanks! |
How long are your reads? Perhaps bowtie has a hard time aligning reads to references shorter than the read length. Yes, you could try running in mixed mode or setting a larger amplicon length in your AMPLICONS_FILE.txt. You should also check that your AMPLICONS_FILE.txt is correct. In your CRISPRessoPooled command you reference AMPLICONS_FILE.txt, but you cat ../AMPLICONS_FILE.txt. |
My read is 150bp. I will try to make a larger amplicon that's longer than the read length. The previous AMPLICONS_FILE.txt looks like: <style> </style>
|
Hi @kclem I generated an amplicons_file.txt with 210 bp long using the same format as the above, which did help. I also upgraded to CRISPResso2 v2.3.1, solving the numpy error. However, I still had a problem died on amplicon #2. I ran the command prompted by the program. CRISPResso -r1 CRISPRessoPooled_on_AMPLICONS_S106_Par/AMPL_deletion.fastq.gz -a CACCCCTCCGCCACCCTCCCCTTCTCAATCACACAGGTCAAGCAAGGTCAGGAGCCAGTGGAGCCCCAGGGCCACCTACCGAGGACAATGAGGACGTCCCTGTCGTGCGGGGGTCCTGGAGCCAAGAGAAACAACAGAAGGAGCCAGCGTCAGTGGAGCATGCCAACGGTTCATCCTACATCGAGAGGCACTGGAGAACTCACCAAGAAC -o CRISPRessoPooled_on_AMPLICONS_S106_Par --name deletion --trimmomatic_command None --quantification_window_size 1 --fastp_options_string " --disable_adapter_trimming --disable_trim_poly_g --disable_quality_filtering --disable_length_filtering" --needleman_wunsch_gap_incentive 1 --n_processes 1 --flexiguide_homology 80 --min_paired_end_reads_overlap 10 --prime_editing_gap_open_penalty -50 --min_average_read_quality 0 --aln_seed_count 5 --quantification_window_center -3 --prime_editing_pegRNA_scaffold_min_match_length 1 --flash_command None --conversion_nuc_from C --default_min_aln_score 60 --conversion_nuc_to T --flexiguide_seq None --needleman_wunsch_gap_extend -2 --needleman_wunsch_gap_open -20 --keep_intermediate --min_single_bp_quality 0 --exclude_bp_from_right 15 --aln_seed_min 2 --prime_editing_gap_extend_penalty 0 --plot_window_size 20 --exclude_bp_from_left 15 --max_rows_alleles_around_cut_to_plot 50 --min_bp_quality_or_N 0 --aln_seed_len 10 --verbosity 3 --max_paired_end_reads_overlap 160 --needleman_wunsch_aln_matrix_loc EDNAFULL --fastp_command fastp --min_frequency_alleles_around_cut_to_plot 0.2 --prime_editing_pegRNA_extension_quantification_window_size 5 --config_file None --amplicon_seq CACCCCTCCGCCACCCTCCCCTTCTCAATCACACAGGTCAAGCAAGGTCAGGAGCCAGTGGAGCCCCAGGGCCACCTACCGAGGACAATGAGGACGTCCCTGTCGTGCGGGGGTCCTGGAGCCAAGAGAAACAACAGAAGGAGCCAGCGTCAGTGGAGCATGCCAACGGTTCATCCTACATCGAGAGGCACTGGAGAACTCACCAAGAAC The error message was: I checked the intermediate file: Amplicon #2 obviously have a lot of mapped reads. Why did I get the error of "No alignments were found?" |
I'm not sure exactly what the problem is. You could try:
|
Hi @kclem |
Additional question in CRISPRessoAggregate. It seems that a lot of plots failed due to the same two errors:
This doesn't seem a parameter that the user can set. Is there any quick fix for these errors? |
Thanks @tangxj98, this bug has been fixed and the fix will be available in the next release. If you want to have the fix now, you can clone the repo ( |
Thank you, @Colelyman and @kclem . I installed the v2.3.2, the problem in Aggregate is gone. Thanks! Additional problems (sorry!) :
INFO @ Thu, 12 Sep 2024 14:55:57 (0.0% done): ~/crispresso2_env/lib/python3.8/site-packages/CRISPResso2/CRISPRessoWGSCORE.py:238: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead. ~/crispresso2_env/lib/python3.8/site-packages/CRISPResso2/CRISPRessoWGSCORE.py:238: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead. ERROR: infer_objects() got an unexpected keyword argument 'copy' I think this is a problem from pandas. Could you please let me know the pandas version working with your v2.3.2? My installed version is pandas 1.5.3 (py38h417a72b_0). |
Thanks @tangxj98, would you mind making separate issues for 1 and 3? Also in those issues could you provide the output with the As for point 2, could you provide an example of the sequences of the amplicons (including the wild type amplicon) and one of the aligned reads? If you still have the Hope this helps! |
Also, if you are aligning reads to 150bp on either side of the deletion, you may also get some WT reads aligning to one of the arms and not necessarily spanning the deletion. You may want to do another selection step where you only include reads that span the deletion. |
Hi @tangxj98, Just wanted to follow up to see if you are still running into these issues, if so please let us know! Thanks, |
I am working on a CRISPR data treated by two sgRNAs and sequenced with targeted sequencing. The distance between the two sgRNA is 3000bp which is much longer than amplicon. The targeted area is the whole gene (380Kbp). Which mode should I use?
I tried to run with CRISPRessoWGS but got errors.
My WGS mode command:
CRISPRessoWGS -b S106_D.sorted.bam -f focused.region.txt -r refs/human/GRCh38/processed/2015_04_04/seqs_for_alignment_pipelines.ucsc_ids/bwa_index/GCA_000001405.15_GRCh38_no_alt_short_headers_nonACTG_to_N.fa --name CRISPR_S106_D -g TTAAACTTGTGTTTGCTGCG,GAGGACGTCCCTGTCGATGT
The error message asked me to try the amplicon mode:
ERROR: CRISPResso region #0 failed. For more information, try running the command: " CRISPResso -r1 CRISPRessoWGS_on_CRISPR_S106_D/ANALYZED_REGIONS/REGION_0.fastq.gz -a CTGGCTCCTTCTGTTGTTTCTCTTGGCTCCAGGACCCCCGCAGCAAACACAAGTTTAAGATCCACACGTACTCCAGCCCCAC -o CRISPRessoWGS_on_CRISPR_S106_D --name exon4-1 --max_rows_alleles_around_cut_to_plot 50 --prime_editing_pegRNA_scaffold_min_match_length 1 --needleman_wunsch_gap_extend -2 --needleman_wunsch_aln_matrix_loc EDNAFULL --conversion_nuc_to T --n_processes 1 --aln_seed_len 10 --trimmomatic_command trimmomatic --min_frequency_alleles_around_cut_to_plot 0.2 --min_bp_quality_or_N 0 --aln_seed_min 2 --flash_command flash --default_min_aln_score 60 --flexiguide_homology 80 --min_average_read_quality 0 --prime_editing_pegRNA_extension_quantification_window_size 5 --min_paired_end_reads_overlap 10 --needleman_wunsch_gap_incentive 1 --plot_window_size 20 --quantification_window_center -3 --quantification_window_size 1 --min_single_bp_quality 0 --max_paired_end_reads_overlap 100 --needleman_wunsch_gap_open -20 --conversion_nuc_from C --exclude_bp_from_left 15 --aln_seed_count 5 --exclude_bp_from_right 15 --guide_seq TTAAACTTGTGTTTGCTGCG,GAGGACGTCCCTGTCGATGT &> lo
I tried and the error message is:
ERROR: The guide sequence 1 (GAGGACGTCCCTGTCGATGT) provided is not present in the amplicon sequences!
I don't know how the amplicon sequence (CTGGCTCCTTCTGTTGTTTCTCTTGGCTCCAGGACCCCCGCAGCAAACACAAGTTTAAGATCCACACGTACTCCAGCCCCAC) was made up by the code. Could you please give some suggestion about how to work with my targeted sequencing data?
Thank you very much!
The text was updated successfully, but these errors were encountered: