Long-read RNA-seq (Kinnex) Pipeline

1. Download the `yml` file and create the conda environment

wget https://raw.githubusercontent.com/mydennislab/Long_read_RNA_seq/refs/heads/main/Long_read_RNA_seq_env.yml
conda env create -f Long_read_RNA_seq_env.yml
conda activate Long_read_RNA_seq_env

Isoseq pipeline documented here: https://isoseq.how/clustering/cli-workflow.html

2. Download MAS adapter fasta.

Download MAS adapter fasta. These will either be the MAS16 set or the MAS8 set

MAS16 barcodes:

wget https://raw.githubusercontent.com/mydennislab/Long_read_RNA_seq/refs/heads/main/mas16_primers.fasta

MAS8 barcodes:

wget https://raw.githubusercontent.com/mydennislab/Long_read_RNA_seq/refs/heads/main/mas8_primers.fasta

3. Run skera split to generate segmented reads

Use Skera to split Kinnex PacBio HiFi reads at adapter positions generating segmented reads

skera split -j <threads> <hifi_reads.bam> <adapter_primers.fasta> segmented_MASadapters.bam

4. Download barcoded Iso-Seq primers

barcoded Iso-Seq primers are here:

wget https://raw.githubusercontent.com/mydennislab/Long_read_RNA_seq/refs/heads/main/barcoded_IsoSeq_primers.fa

5. Primer removal and demultiplexing

Use lima to create Full length (FL) reads.

lima -j 60 --isoseq segmented_MASadapters.bam barcoded_IsoSeq_primers.fa fl_reads.bam

This will create a variety of files depending on the barcodes detected. Use cat fl_reads.lima.counts to see counts and identify which adapter was used. There will be a file with significantly more reads than the other barcodes Use this file for isoseq refine

6. Refine

Refinement involves:

Trimming of poly(A) tails
Rapid concatemer identification and removal

This results in creation of Full length non-concatemer (FLNC) reads

isoseq refine -j 60 --require-polya fl_reads.barcoded_IsoSeq_adapter_ID.bam barcoded_IsoSeq_primers.fa flnc_require-polya.bam

7. Oarfish: transcript quantification from long-read RNA-seq data

Use Oarfish to get transcript counts

Oarfish docs: https://github.com/COMBINE-lab/oarfish

Check oarfish flags:

oarfish -h

Run Oarfish. Oarfish will take the FLNC reads as input as well as a transcriptome

oarfish --verbose --output <output_folder/output_header> -j <threads> --reads flnc_require-polya.bam --reference <transcriptome_ref.fa> --index-out <folder_with_transcriptome_ref.fa> --seq-tech <ont-cdna, ont-drna, pac-bio, pac-bio-hifi>

Oarfish output should be a folder with these files:

Sample.ambig_info.tsv
Sample.meta_info.json
Sample.quant

Sample.quant will contain transcript name, transcript length, and normalized counts.

8. Summarise transcript counts to get gene level counts:

Gene_sum_oarfish_trancript_counts = fread("Sample.quant") %>%
    group_by(tname) %>%
    summarise(gene_sum = sum(num_reads)) 
head(Gene_sum_oarfish_trancript_counts)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Long-read RNA-seq (Kinnex) Pipeline

1. Download the `yml` file and create the conda environment

2. Download MAS adapter fasta.

3. Run skera split to generate segmented reads

4. Download barcoded Iso-Seq primers

5. Primer removal and demultiplexing

6. Refine

7. Oarfish: transcript quantification from long-read RNA-seq data

8. Summarise transcript counts to get gene level counts:

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Long_read_RNA_seq_env.yml		Long_read_RNA_seq_env.yml
README.md		README.md
barcoded_IsoSeq_primers.fa		barcoded_IsoSeq_primers.fa
mas16_primers.fasta		mas16_primers.fasta
mas8_primers.fasta		mas8_primers.fasta

mydennislab/Long_read_RNA_seq

Folders and files

Latest commit

History

Repository files navigation

Long-read RNA-seq (Kinnex) Pipeline

1. Download the yml file and create the conda environment

2. Download MAS adapter fasta.

3. Run skera split to generate segmented reads

4. Download barcoded Iso-Seq primers

5. Primer removal and demultiplexing

6. Refine

7. Oarfish: transcript quantification from long-read RNA-seq data

8. Summarise transcript counts to get gene level counts:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

1. Download the `yml` file and create the conda environment

Packages