Skip to content

Latest commit

 

History

History
59 lines (54 loc) · 2.35 KB

README.md

File metadata and controls

59 lines (54 loc) · 2.35 KB

This code runs the analysis specified in Ortiz et al. as part of my MSc degree

How to run it:

./pipeline_check.sh will start the analysis by default, it will not generate any graphics, in order to activate graphics generation, run it as ./pipeline_check.sh -g

Summary of analysis

  1. collect protein families
  2. Run IUPred
  3. Generate tables (discrete and continuous) including gaps from alignment
  4. Run Gloome
  5. Generate graphics (optional)

Prerequisites

IUPred (iupred)
remember to set environment variables
in bash:
export IUPred_PATH='/path/to/iupred'

Gloome (gloome)
gnu parallel
gnuplot

Perl Modules

  • LWP::UserAgent
  • HTTP::Request::Common
  • Pod::Usage
  • Getopt::Regex
  • Config::File::Simple
  • cpan
  • Bioperl

Python modules

  • Biopython
  • pandas

Restrictions

Alignment files and their corresponding rooted tree files should reside on this directory.
The alignment file should be called .aln.fa (Fasta) and the tree file .aln.tre (Newick). This restriction was imposed, so that analysis of a large number of protein families can be automated.

Other restrictions

The names in both files should match (case-sensitive) (it is highly recommended to avoid special characters or spaces in sequence names. A script in the directory taxa_names was included to parse alignment from Genbank sequences) The tree file shouldn't have internal node names, or any annotation (the pipeline will try to remove bootstrap values) (Gloome's restriction).
The tree should be rooted (Gloome's restriction)

Output

Directory contents:

done/: gloome run files for finished analysis forplot/

  • *.fortree: per-branch Gloome results (changes per tree length)
  • *.gp: commands for gnuplot (plotting results)
  • *.parsed: parsed Gloome results for gnuplot
  • *plot.svg: plot of parsed Gloome results (changes per tree length)
  • gloome_*/: raw gloome results (read Gloome documentation)
    • In addition to the raw gloome results, *.forgloome: 'sequence' file for gloome
  • heatmaps/: iTOL output
    • *.1.svg: tree + heatmap
    • *.upitol: upload log from iTOL
  • iupred/: IUPred output and parsed tables
    • *.discrete: discrete table of IUPred results
    • *.iupred.parse.final: Continuous table of IUPred results