diff --git a/README.md b/README.md index 04121af6..c41f6a9f 100644 --- a/README.md +++ b/README.md @@ -2,20 +2,23 @@ ### Overview -The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package intended for analysis and clinical interpretation of individual cancer genomes. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces HTML reports that can be navigated by clinical oncologists (Figure 1). +The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package for functional annotation and translation of individual cancer genomes for precision oncology. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces interactive HTML reports intended for clinical interpretation (Figure 1). ![PCGR overview](PCGR_workflow.png) ### Example reports -* View an example report for a colorectal tumor sample (TCGA) -* View an example report for a breast tumor sample (TCGA) +* Report for a colorectal tumor sample (TCGA) +* Report for a breast tumor sample (TCGA) ### PCGR documentation [![Documentation Status](https://readthedocs.org/projects/pcgr/badge/?version=latest)](http://pcgr.readthedocs.io/en/latest/?badge=latest) +If you use PCGR, please cite our paper: -### Annotation resources included in PCGR (v0.3.2) +Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and Eivind Hovig. __Personal Cancer Genome Reporter: Variant Interpretation Report For Precision Oncology__ (2017). bioRxiv. doi:[10.1101/122366](https://doi.org/10.1101/122366) + +### Annotation resources included in PCGR (v0.3.3) * [VEP v85](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 85 (GENCODE v19 as the gene reference dataset) * [COSMIC v80](http://cancer.sanger.ac.uk/cosmic/) - Catalogue of somatic mutations in cancer (February 2017) @@ -53,16 +56,16 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/ #### STEP 2: Download PCGR -April 19th 2017: New release (0.3.2) +April 20th 2017: New release (0.3.3) -1. Download and unpack the [latest release (0.3.2)](https://github.com/sigven/pcgr/releases/latest) +1. Download and unpack the [latest release (0.3.3)](https://github.com/sigven/pcgr/releases/latest) 2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory - * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.2`) + * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.3`) * Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -` A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced -3. Pull the [PCGR Docker image (0.3.2)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb): - * `docker pull sigven/pcgr:0.3.2` (PCGR annotation engine) +3. Pull the [PCGR Docker image (0.3.3)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.2Gb): + * `docker pull sigven/pcgr:0.3.3` (PCGR annotation engine) #### STEP 3: Input preprocessing @@ -73,12 +76,9 @@ The PCGR workflow accepts two types of input files: PCGR can be run with either or both of the two input files present. -The following requirements __MUST__ be met by the input VCF for PCGR to work properly: +* We __strongly__ recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html) +* If the input VCF contains multi-allelic sites, these will be subject to [decomposition](http://genome.sph.umich.edu/wiki/Vt#Decompose) -1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. This can be done with the help of either [vt decompose](http://genome.sph.umich.edu/wiki/Vt#Decompose) or [vcflib's vcfbreakmulti](https://github.com/vcflib/vcflib#vcflib). We will add integrated support for this in an upcoming release -2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by [vcftools](https://vcftools.github.io/perl_module.html#vcf-sort). - * We strongly recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html) - * 'chr' must be stripped from the chromosome names The tab-separated values file with copy number aberrations __MUST__ contain the following four columns: * Chromosome @@ -112,7 +112,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t positional arguments: pcgr_dir PCGR base directory with accompanying data directory, - e.g. ~/pcgr-0.3.2 + e.g. ~/pcgr-0.3.3 output_dir Output directory sample_id Tumor sample/cancer genome identifier - prefix for output files @@ -146,7 +146,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command: -`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD` +`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD` This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder: @@ -157,3 +157,7 @@ This command will run the Docker-based PCGR workflow and produce the following o 5. __tumor_sample.COAD.pcgr.mutational_signatures.tsv__ - Tab-separated values file with estimated contributions by known mutational signatures and associated underlying etiologies 6. __tumor_sample.COAD.pcgr.snvs_indels.biomarkers.tsv__ - Tab-separated values file with clinical evidence items associated with biomarkers for diagnosis, prognosis or drug sensitivity/resistance 7. __tumor_sample.COAD.pcgr.cna_segments.tsv.gz__ - Tab-separated values file with annotations of gene transcripts that overlap with somatic copy number aberrations + +## Contact + +sigven@ifi.uio.no diff --git a/docs/_build/doctrees/about.doctree b/docs/_build/doctrees/about.doctree index 05efd90d..d23ecbb6 100644 Binary files a/docs/_build/doctrees/about.doctree and b/docs/_build/doctrees/about.doctree differ diff --git a/docs/_build/doctrees/environment.pickle b/docs/_build/doctrees/environment.pickle index 9f14d7d0..56ff6e01 100644 Binary files a/docs/_build/doctrees/environment.pickle and b/docs/_build/doctrees/environment.pickle differ diff --git a/docs/_build/doctrees/getting_started.doctree b/docs/_build/doctrees/getting_started.doctree index 8177a7a9..d4472f02 100644 Binary files a/docs/_build/doctrees/getting_started.doctree and b/docs/_build/doctrees/getting_started.doctree differ diff --git a/docs/_build/doctrees/output.doctree b/docs/_build/doctrees/output.doctree index 31d6ddef..6da63441 100644 Binary files a/docs/_build/doctrees/output.doctree and b/docs/_build/doctrees/output.doctree differ diff --git a/docs/_build/html/_sources/about.rst.txt b/docs/_build/html/_sources/about.rst.txt index 9a1ab9eb..7f1232e7 100644 --- a/docs/_build/html/_sources/about.rst.txt +++ b/docs/_build/html/_sources/about.rst.txt @@ -5,14 +5,15 @@ What is the Personal Cancer Genome Reporter (PCGR)? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Personal Cancer Genome Reporter (PCGR) is a stand-alone software -package intended for analysis and clinical interpretation of individual -cancer genomes. It interprets both somatic SNVs/InDels and copy number -aberrations. The software extends basic gene and variant annotations -from the `Ensembl’s Variant Effect Predictor +package for functional annotation and translation of individual cancer +genomes for precision oncology. It interprets both somatic SNVs/InDels +and copy number aberrations. The software extends basic gene and variant +annotations from the `Ensembl’s Variant Effect Predictor (VEP) `__ with oncology-relevant, up-to-date annotations retrieved flexibly through -`vcfanno `__, and produces HTML -reports that can be navigated by clinical oncologists (Figure 1). +`vcfanno `__, and produces +interactive HTML reports intended for clinical interpretation (Figure +1). .. figure:: PCGR_workflow.png :alt: @@ -22,6 +23,12 @@ affiliated with the `Norwegian Cancer Genomics Consortium `__, at the `Institute for Cancer Research/Oslo University Hospital `__. +Example reports +^^^^^^^^^^^^^^^ + +- Report for a colorectal tumor sample (TCGA) +- Report for a breast tumor sample (TCGA) + Why use PCGR? ~~~~~~~~~~~~~ @@ -37,6 +44,13 @@ and variant level. The application generates a tiered report that will aid the interpretation of individual cancer genomes in a clinical setting. +If you use PCGR, please cite our paper: + +Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and +Eivind Hovig. **Personal Cancer Genome Reporter: Variant Interpretation +Report For Precision Oncology** (2017). bioRxiv. +doi:\ `10.1101/122366 `__ + Docker-based technology ~~~~~~~~~~~~~~~~~~~~~~~ @@ -50,3 +64,8 @@ for precision oncology `__. .. figure:: docker-logo50.png :alt: + +Contact +~~~~~~~ + +sigven@ifi.uio.no diff --git a/docs/_build/html/_sources/getting_started.rst.txt b/docs/_build/html/_sources/getting_started.rst.txt index cce4c2d5..762972a8 100644 --- a/docs/_build/html/_sources/getting_started.rst.txt +++ b/docs/_build/html/_sources/getting_started.rst.txt @@ -42,10 +42,10 @@ terminal window. Download PCGR ^^^^^^^^^^^^^ -**April 19th 2017**: New release (0.3.2) +**April 20th 2017**: New release (0.3.3) - Download and unpack the `latest release - (0.3.2) `__ + (0.3.3) `__ - Download and unpack the data bundle (approx. 17Gb) in the PCGR directory @@ -53,7 +53,7 @@ Download PCGR - Download `the latest data bundle `__ from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the - version number, e.g. ``~/pcgr-0.3.2``) + version number, e.g. ``~/pcgr-0.3.3``) - Decompress and untar the bundle, e.g. through the following Unix command: ``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`` @@ -62,10 +62,10 @@ Download PCGR have been produced - Pull the `PCGR Docker image - - 0.3.2 `__ from DockerHub - (3.1Gb) : + 0.3.3 `__ from DockerHub + (3.2Gb) : - - ``docker pull sigven/pcgr:0.3.2`` (PCGR annotation engine) + - ``docker pull sigven/pcgr:0.3.3`` (PCGR annotation engine) Run test - generation of clinical report for a cancer genome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -89,7 +89,7 @@ A tumor sample report is generated by calling the Python script positional arguments: pcgr_dir PCGR base directory with accompanying data directory, - e.g. ~/pcgr-0.3.2 + e.g. ~/pcgr-0.3.3 output_dir Output directory sample_id Tumor sample/cancer genome identifier - prefix for output files @@ -125,7 +125,7 @@ sequenced within TCGA. A report for a colorectal tumor case can be generated by running the following command in your terminal window: ``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments`` -``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD`` +``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD`` This command will run the Docker-based PCGR workflow and produce the following output files in the *examples* folder: diff --git a/docs/_build/html/_sources/output.rst.txt b/docs/_build/html/_sources/output.rst.txt index e59a1c9d..5ad23bcd 100644 --- a/docs/_build/html/_sources/output.rst.txt +++ b/docs/_build/html/_sources/output.rst.txt @@ -18,23 +18,11 @@ currently supported. VCF ^^^ -The following requirements **MUST** be met by the input VCF for PCGR to -work properly: - -1. Variants in the raw VCF that contain multiple alternative alleles - (e.g. "multiple ALTs") must be split into variants with a single - alternative allele. This can be done with the help of either `vt - decompose `__ or - `vcflib's vcfbreakmulti `__. - We will add integrated support for this in an upcoming release -2. The contents of the VCF must be sorted correctly (i.e. according to - chromosomal order and chromosomal position). This can be obtained by - `vcftools `__. - - - We **strongly** recommend that the input VCF is compressed and - indexed using `bgzip `__ and - `tabix `__ - - 'chr' must be stripped from the chromosome names +- We **strongly** recommend that the input VCF is compressed and + indexed using `bgzip `__ and + `tabix `__ +- If the input VCF contains multi-allelic sites, these will be subject + to `decomposition `__ **IMPORTANT NOTE 1**: Considering the VCF output for the `numerous somatic SNV/InDel callers `__ that diff --git a/docs/_build/html/about.html b/docs/_build/html/about.html index 45ef2300..635de714 100644 --- a/docs/_build/html/about.html +++ b/docs/_build/html/about.html @@ -89,9 +89,13 @@

Table of Contents

  • Getting started
      diff --git a/docs/_build/html/output.html b/docs/_build/html/output.html index 046536e9..6c827d66 100644 --- a/docs/_build/html/output.html +++ b/docs/_build/html/output.html @@ -181,25 +181,13 @@

      Input currently supported.

      VCF

      -

      The following requirements MUST be met by the input VCF for PCGR to -work properly:

      -
        -
      1. Variants in the raw VCF that contain multiple alternative alleles -(e.g. “multiple ALTs”) must be split into variants with a single -alternative allele. This can be done with the help of either vt -decompose or -vcflib’s vcfbreakmulti. -We will add integrated support for this in an upcoming release
      2. -
      3. The contents of the VCF must be sorted correctly (i.e. according to -chromosomal order and chromosomal position). This can be obtained by -vcftools.
          +
          • We strongly recommend that the input VCF is compressed and indexed using bgzip and tabix
          • -
          • ‘chr’ must be stripped from the chromosome names
          • +
          • If the input VCF contains multi-allelic sites, these will be subject +to decomposition
          - -

      IMPORTANT NOTE 1: Considering the VCF output for the numerous somatic SNV/InDel callers that have been developed, we have a experienced a general lack of uniformity diff --git a/docs/_build/html/searchindex.js b/docs/_build/html/searchindex.js index e0375803..46b3fd24 100644 --- a/docs/_build/html/searchindex.js +++ b/docs/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({docnames:["about","annotation_resources","getting_started","index","output"],envversion:50,filenames:["about.rst","annotation_resources.rst","getting_started.rst","index.rst","output.rst"],objects:{},objnames:{},objtypes:{},terms:{"1000g":4,"1000genom":1,"12th":[],"140453136a":1,"14th":[],"16gb":[],"17gb":2,"17th":[],"19th":2,"1gb":2,"1kg":[],"2016_03":[],"2016_09":1,"2020plu":2,"28th":[],"2gb":[],"5gb":2,"5th":1,"8th":1,"abstract":1,"case":[1,2,4],"class":4,"default":2,"function":[2,3,4],"import":[0,2,4],"new":2,"public":4,"short":[1,4],CDS:4,EAS:4,For:[1,2,4],IDs:4,POS:[],SAS:4,The:[0,1,2,4],There:0,These:[],_strong:[],abber:3,aberr:[0,2,4],about:3,abov:4,accept:4,acceptor:4,access:[0,4],accompani:2,accord:[2,4],acid:4,acquir:0,across:4,action:4,actual:1,ada:[],adapt:4,add:4,adddit:4,addit:[0,4],adenoma:1,adjust:4,advanc:2,af_norm:[],af_tumor:[],affect:4,affected_spl:[],affili:0,afr:4,afr_af_1kg:4,afr_af_exac:4,afr_af_gnomad:4,african:4,after:4,aggreg:4,aid:0,algorithm:4,align:4,all:[0,1],allel:4,allele_num:4,alon:0,alreadi:[1,2],also:1,alt:4,alter:[2,4],altern:4,american:4,amino:4,amino_acid:4,among:[],amplif:2,amr:4,amr_af_1kg:4,amr_af_exac:4,amr_af_gnomad:4,analys:2,analysi:0,analyz:4,ani:[2,4],annot:[0,2,3],annotation_resourc:[],antineoplast:[1,4],antineoplastic_drug_interact:4,antineoplastic_drugs_dgidb:4,append:4,appli:4,applic:[0,4],appri:4,approv:4,approx:2,april:[1,2],argument:2,asian:4,assembl:4,assign:2,associ:2,attach:[0,4],aug:[],b147:1,base:[2,3,4],basic:[0,2,3,4],been:[0,1,2,4],below:4,benign:[],best:4,betweeen:1,bgzip:4,bind:4,biocomput:4,bioconductor:4,biologi:0,biomark:[0,1,2],biotyp:4,block:4,block_substitut:[],bm_citat:4,bm_clinical_signific:4,bm_disease_nam:4,bm_drug_nam:4,bm_evidence_direct:4,bm_evidence_level:4,bm_evidence_typ:4,bm_rate:4,bool:[],boost:4,both:[0,4],braf:1,breast:4,browser:4,build:4,bundl:[0,1,2],cadd:4,call:2,call_confid:[],caller:4,can:[0,2,4],cancer:4,cancer_census_germlin:4,cancer_census_somat:4,cancer_mutation_hotspot:4,cancer_typ:4,cancerhotspot:4,candid:4,canon:4,cap:4,caption:[],care:0,carri:4,catalog:[1,4],catalogu:1,categori:4,caus:4,causal:4,cbmdb:1,cbmdb_id:4,ccd:4,cdna:4,cdna_posit:4,cds:[],cds_chang:4,cds_end_nf:4,cds_posit:4,cds_start_nf:4,cell:4,cell_typ:4,cellular:[],cencu:1,censu:4,challeng:0,chang:[2,4],check:2,chr1:4,chr7:1,chr:4,chrom:4,chrome:4,chromosom:4,citat:[],cite:4,civic:[1,4],civic_id:4,civic_id_2:4,classif:4,clin:[],clin_sig:4,clinic:[0,3],clinvar:[1,4],clinvar_msid:4,clinvar_pmid:4,clinvar_sig:4,clinvar_variant_origin:4,cluster:4,cna:2,cna_seg:[2,4],cnminor:[],cntotal:[],cnv:[],coad:2,code:3,codon:[1,4],codon_numb:4,cohort:1,coincid:4,collect:1,colorect:[2,4],column:4,com:[],come:[1,4],command:2,common:4,complet:[2,4],complex:0,composit:4,comprehens:0,compress:4,comput:2,concern:3,confer:1,confid:[],confirmed_somat:1,consensu:4,consequ:3,consid:4,consortium:[0,1,4],constitut:4,contain:[0,2,4],content:[3,4],context:0,contribut:[2,4],convent:4,coordin:4,copi:[0,2,3],correctli:4,correspond:1,cosmic:[1,4],cosmic_cancer_type_al:4,cosmic_cancer_type_gw:4,cosmic_codon_count_gw:4,cosmic_codon_frac_gw:4,cosmic_consequ:4,cosmic_count_gw:4,cosmic_drug_resist:4,cosmic_fathmm_pr:4,cosmic_mutation_id:4,cosmic_sample_sourc:4,cosmic_site_histolog:4,cosmic_vartyp:4,count:4,cover:4,cpu:2,creat:[],criteria:1,csq:4,curat:[1,4],current:4,damag:[],data:[0,2,3,4],databas:3,databundl:2,dataset:3,date:0,dbnsfp:[1,4],dbnsfp_consensus_lr:4,dbnsfp_consensus_svm:4,dbsnp:[1,4],dbsnp_mappingstatu:4,dbsnp_submiss:4,dbsnp_valid:4,dbsnpbuildid:4,dbsnprsid:4,dec:1,decompos:4,decompress:2,deconstructsig:4,dedic:[],defin:4,delet:[2,4],delin:4,denot:4,depend:0,depth:4,deriv:4,describ:4,descript:4,determin:[],develop:[0,4],dgidb:1,diagnosi:2,diagnost:[0,4],differ:[2,4],direct:[],directli:4,directori:2,discov:1,diseas:4,distanc:4,distribut:4,dna:4,doc:[],docker:3,dockerhub:2,docm:1,docm_diseas:4,docm_pmid:4,document:[],domain:[3,4],done:4,donor:4,download:[],downstream:2,dp_normal:[],dp_tumor:[],drive:2,driver:[1,4],drug:[1,2,4],dure:2,each:4,eas_af_1kg:4,eas_af_exac:4,eas_af_gnomad:4,east:4,effect:[0,3],effect_predict:4,either:[1,4],emerg:4,encourag:4,end:4,engin:2,ensembl:[0,4],ensembl_gene_id:4,ensembl_transcript_id:4,ensp:4,entrez:4,entrez_id:4,error:2,estim:2,etc:[1,4],etiolog:[2,4],eur:4,eur_af_1kg:4,european:4,event:4,evid:[2,4],exac:[1,4],exampl:[2,4],exist:[2,4],existing_vari:4,exit:2,exom:[1,4],exon:[1,4],experi:4,experienc:4,experiment:4,expert:0,explor:4,extend:0,facet:[],factor:4,fail:2,fall:4,fals:2,famili:1,fathmm:4,fathmm_mkl:4,fda:1,featur:[3,4],feature_typ:4,feb:[],februai:1,februari:1,figur:0,file:[2,4],fin:[],fin_af_exac:4,fin_af_gnomad:4,find:[0,4],finnish:4,firefox:4,first:4,flag:[2,4],flag_pick_allel:4,flank:4,flexibl:0,focu:[],folder:2,follow:[1,2,4],forc:2,force_overwrit:2,fork:2,form:0,format:0,found:[1,2,4],four:4,frac:[],fraction:[],fraction_mut:4,frameshift:4,frequenc:3,from:[0,1,2,4],g12:1,gain:4,gencod:[1,4],gencode_tag:4,gencode_transcript_typ:4,gencode_v19:4,gene:[0,2,3],gene_biotyp:4,gene_nam:4,gene_pheno:4,gene_symbol:4,gener:[0,3,4],genet:1,genindex:[],genom:4,genome_vers:4,genomic_chang:4,genotyp:4,germlin:1,gerp:4,get:3,getting_start:[],given:4,global:4,global_af_1kg:4,global_af_exac:4,global_af_gnomad:4,gnomad:1,googl:[2,4],grch37:[1,2,4],great:0,guidelin:[1,4],gwa:4,gwas_catalog_pmid:4,gwas_catalog_trait_uri:4,gz_:[],gzip:2,handl:4,has:[0,2,4],have:[0,1,2,4],hdiv:4,help:[2,4],here:4,hgnc:[],hgnc_id:4,hgv:[1,4],hgvs_offset:4,hgvsc:4,hgvsp:4,hgvsp_short:4,high:4,high_inf_po:4,higher:[],highlight:0,histolog:[1,4],hit:4,homozyg:2,hospit:0,host:2,hotspot:[1,4],how:4,howev:1,html:[0,2,3],http:4,human:[1,4],humdiv:[],hvar:4,icgc:[1,4],icgc_project:4,identifi:[1,2,4],iii:0,imag:[0,2],impact:4,implic:4,improv:4,includ:[1,4],incomplet:4,indel:[0,2,3],index:4,indic:4,individu:[0,4],inf:[],infer:4,inferenti:4,info:4,inform:1,initi:4,input:[2,3],input_cna_seg:2,input_vcf:2,insert:4,insilico:3,instal:0,institut:0,instruct:2,integr:[0,4],intend:0,interact:[1,2,3],intern:1,interpret:[0,1,2],interrog:0,intersect:4,intogen:[1,4],intogen_driv:4,intogen_driver_mut:4,intro:[],intron:4,isoform:4,isol:0,item:[2,4],its:4,jan:4,june:[],kit:1,knowledg:[0,3],knowledgebas:1,known:[1,2,4],kra:1,lack:4,larg:[],latest:2,least:[],length:4,level:[0,1,4],librari:0,lies:4,like:4,limit:1,line:[],link:4,linux:2,list:[],literatur:[1,4],log:[2,4],logist:4,logr:4,logr_threshold_amplif:2,logr_threshold_homozygous_delet:2,lost:4,low:4,lrt:4,mac:2,machin:[2,4],maf:2,mai:[1,4],make:[],malign:1,mani:4,map:3,mappabl:4,mappingstatu:[],march:1,marker:1,master:[],match:4,matter:4,maxdepth:[],mean:4,measur:4,memori:2,messag:2,met:4,minim:2,minimum:[],minor:[],missens:4,mix:4,mkdir:[],mkl:[],modifi:4,modindex:[],modul:[],most:[0,1,4],motif:4,motif_nam:4,motif_po:4,motif_score_chang:4,motiffeatur:4,mozilla:4,mrna:4,msid:[],multipl:4,must:[1,2,4],mut:[],mutat:[0,1,2],mutational_signatur:[2,4],mutationassessor:4,mutationtast:4,mutect:[],mutpr:4,mutsigcv:2,name:4,navig:0,nccn:1,ncgc:[],need:0,nfe:[],nfe_af_exac:4,nfe_af_gnomad:4,nomenclatur:1,non:[1,4],none:2,normal:4,norwegian:0,notat:4,note:[3,4],nov:4,novemb:1,novo:4,now:2,nucleotid:[2,4],num:[],num_vcfanno_process:2,num_vep_fork:2,number:[0,2,3],numer:4,observ:4,obtain:4,oct:[],offset:[],oncogen:[1,4],oncolog:[0,2],oncologist:0,oncoscor:4,one:4,onli:[0,1,4],ontolog:4,option:[2,4],order:4,org:4,organ:[2,4],origin:4,oslo:0,osx:[],oth:[],oth_af_exac:4,oth_af_gnomad:4,other:[2,3],out:4,output:[2,3],output_dir:2,overlap:[2,4],overview:[],overwrit:2,packag:[0,4],page:[],pair:4,pars:4,part:[1,4],particular:4,pass:4,pcgr:[1,3,4],pcgr_dir:2,pcgr_directori:[],pcgreport:[],percent:4,person:2,pfam:1,phase3:1,phase:4,pheno:4,phenotyp:4,phred:[],pick:4,pipelin:4,platform:2,pmid:4,point:4,polyp:1,polyphen2:4,portrai:4,pose:0,posit:[2,4],possibl:1,potenti:4,pre:4,precis:[0,2],pred:[],predict:[3,4],predictor:[0,1],predispos:4,predisposit:4,prefer:2,prefix:2,prerequisit:3,present:[0,4],primari:4,princip:4,prinicip:[],priorit:0,process:[2,4],produc:[0,2,4],product:4,profil:4,prognosi:2,prognost:[0,4],program:2,project:4,properli:4,proport:1,propos:4,proposed_aetiolog:4,prot:4,protein:3,protein_chang:4,protein_domain:4,protein_posit:4,provean:4,provid:4,pubm:4,pull:2,python:[],qualiti:[3,4],queri:[2,4],quickstart:[],ram:2,rang:4,rate:4,rather:4,ratio:[2,4],raw:4,recommend:4,record:4,ref:[],refer:[1,4],reflect:4,refseq:4,refseq_match:4,refut:4,regress:4,regulatori:4,regulatoryfeatur:4,rel:4,relat:[0,1],releas:[1,2,4],relev:[0,2,4],replac:2,report:1,reported_in_another_cancer_sample_as_somat:1,repres:2,represent:4,requir:[0,1,2,4],research:0,resist:[2,4],resourc:[0,2,3],respect:4,restart:2,result:[0,2,4],retriev:[0,4],revel:4,rich:2,robust:4,root:[],rsid:4,run:[3,4],run_pcgr:[],safari:4,sampl:[1,2,4],sample_id:[2,4],sample_pair_identifi:[],sampleid:4,sas_af_1kg:4,sas_af_exac:4,sas_af_gnomad:4,satisfi:1,scale:4,scarciti:0,scientif:1,scientist:0,score:4,screen:4,script:2,search:[],segment:2,segment_end:4,segment_length:4,segment_mean:4,segment_start:4,sensit:[2,4],sep:[],separ:2,sequenc:[1,2,4],set:[0,2,4],sever:0,shift:4,shortest:4,should:2,show:[2,4],sift:4,sig:[],sigantur:[],signatur:2,signature_id:4,signific:[1,4],sigven:2,similarli:2,singl:4,site:[1,4],snv:[0,2,3],snvs_indel:[2,4],softwar:[0,2],somat:[0,1,2,3],sort:4,sourc:4,south:4,specif:4,sphinx:[],splice:4,splice_site_effect_ada:4,splice_site_effect_bool:[],splice_site_effect_rf:4,split:4,stabl:4,stand:0,standard:4,star:4,start:[3,4],statement:4,statist:1,statu:[1,4],step:[],stop:4,strand:4,strelka:[],strip:4,strive:4,strong:[],strongli:4,structur:4,studi:4,subject:4,submiss:4,submit:4,subset:1,substitut:[],subtyp:4,support:4,suppressor:[1,4],svm:[],swiss:4,swissprot:[1,4],symbol:4,symbol_sourc:4,synonym:[1,4],systemat:0,tab:2,tabix:4,tabl:3,tag:4,take:2,taken:0,tar:2,target:4,tcga:[2,4],technolog:3,termin:2,test:[3,4],test_sampl:[],tfbp:4,tgz:2,thei:1,therapeut:[0,4],therapi:4,therefor:4,thi:[1,2,4],those:4,through:[0,2,4],throughput:[],thu:0,tier:[0,2,4],tier_descript:4,toctre:[],todo:[],tool:[0,2],toolbar:2,total:4,trait:4,transcript:[2,4],transcript_end:4,transcript_overlap_perc:4,transcript_start:4,transvar:1,treatment:4,trembl:4,treshold:2,trial:[1,4],trust:4,tsgene:[1,4],tsgene_oncogen:4,tsl:4,tsv:2,tumor:[0,1,2,4],tumor_sampl:2,tumor_suppressor:4,tumor_typ:4,tumorigenesi:4,two:[2,4],type:[2,4],unambigu:1,unannot:4,underli:[2,4],uniform:4,uniparc:4,uniprot:[1,4],uniprot_featur:4,uniprot_id:4,uniprotkb:4,uniqu:4,univers:0,unix:2,unpack:2,untar:2,upcom:4,upon:[],upper:4,uri:4,usag:2,use:[2,3],used:[1,2],user:4,using:[0,2,4],util:3,v15:[],v19:[1,4],v22:[],v23:1,v30:[],v31:1,v600e:1,v78:[],v80:1,v85:1,valid:4,valu:2,variabl:4,variant:[0,2,3],variant_class:4,variat:4,variou:4,vartyp:[],vcf:2,vcf_sample_id:4,vcfanno:[0,2],vcfbreakmulti:4,vcflib:4,vcftool:4,vector:4,vep:[0,1,2],vep_all_consequ:4,veri:2,version:[2,4],view:4,virtual:2,weak:[],weak_mutect:[],weak_strelka:[],weight:4,what:3,whenev:1,where:4,whether:4,which:[0,1,2,4],why:3,wide:[1,4],window:2,within:[1,2,4],work:4,workflow:[0,2,4],working_directori:[],wtsi:4,xvf:2,you:2,your:2,yyyymmdd:2},titles:["About","Annotation resources","Getting started","Welcome to Personal Cancer Genome Reporter’s documentation!","Input & output"],titleterms:{"function":1,abber:4,about:0,all:4,among:4,annot:[1,4],associ:4,base:[0,1],basic:1,biomark:4,both:[],call:4,cancer:[0,1,2,3],clinic:[1,2,4],code:[1,4],concern:1,consequ:[1,4],copi:4,data:1,databas:[1,4],dataset:1,differ:[],dna:[],docker:[0,2],document:3,domain:1,download:2,drug:[],effect:[1,4],etc:[],exampl:[],featur:1,format:4,frequenc:[1,4],gene:[1,4],gener:2,genom:[0,1,2,3],germlin:4,get:2,hotspot:[],html:4,includ:[],indel:4,indic:[],inform:4,input:4,insilico:1,instal:2,interact:4,introduct:[],knowledg:1,list:4,map:1,marker:[],mutat:4,ncgc:[],note:1,number:4,oncovarexplor:[],other:[1,4],output:4,packag:[],pcgr:[0,2],person:[0,3],predict:1,preprocess:[],prerequisit:2,protein:[1,4],python:2,qualiti:1,report:[0,2,3,4],resourc:1,run:2,segment:4,sensit:[],separ:4,signatur:4,snv:4,somat:4,sourc:[],start:2,tab:4,tabl:[],technolog:0,test:2,tsv:4,tumor:[],type:[],use:0,util:1,valu:4,variant:[1,4],variat:[],vcf:4,vep:4,welcom:3,what:0,why:0}}) \ No newline at end of file +Search.setIndex({docnames:["about","annotation_resources","getting_started","index","output"],envversion:50,filenames:["about.rst","annotation_resources.rst","getting_started.rst","index.rst","output.rst"],objects:{},objnames:{},objtypes:{},terms:{"1000g":4,"1000genom":1,"12th":[],"140453136a":1,"14th":[],"16gb":[],"17gb":2,"17th":[],"19th":[],"1gb":[],"1kg":[],"2016_03":[],"2016_09":1,"2020plu":2,"20th":2,"28th":[],"2gb":2,"5gb":2,"5th":1,"8th":1,"abstract":1,"case":[1,2,4],"class":4,"default":2,"function":[0,2,3,4],"import":[0,2,4],"new":2,"public":4,"short":[1,4],"vod\u00e1k":0,CDS:4,EAS:4,For:[0,1,2,4],IDs:4,POS:[],SAS:4,The:[0,1,2,4],There:0,These:[],_strong:[],aaasheim:0,abber:3,aberr:[0,2,4],about:3,abov:4,accept:4,acceptor:4,access:[0,4],accompani:2,accord:[2,4],acid:4,acquir:0,across:4,action:4,actual:1,ada:[],adapt:4,add:[],adddit:4,addit:[0,4],adenoma:1,adjust:4,advanc:2,af_norm:[],af_tumor:[],affect:4,affected_spl:[],affili:0,afr:4,afr_af_1kg:4,afr_af_exac:4,afr_af_gnomad:4,african:4,after:4,aggreg:4,aid:0,algorithm:4,align:4,all:[0,1],allel:4,allele_num:4,alon:0,alreadi:[1,2],also:1,alt:[],alter:[2,4],altern:4,american:4,amino:4,amino_acid:4,among:[],amplif:2,amr:4,amr_af_1kg:4,amr_af_exac:4,amr_af_gnomad:4,analys:2,analysi:[],analyz:4,ani:[2,4],annot:[0,2,3],annotation_resourc:[],antineoplast:[1,4],antineoplastic_drug_interact:4,antineoplastic_drugs_dgidb:4,append:4,appli:4,applic:[0,4],appri:4,approv:4,approx:2,april:[1,2],argument:2,asian:4,assembl:4,assign:2,associ:2,attach:[0,4],aug:[],b147:1,base:[2,3,4],basic:[0,2,3,4],been:[0,1,2,4],below:4,benign:[],best:4,betweeen:1,bgzip:4,bind:4,biocomput:4,bioconductor:4,biologi:0,biomark:[0,1,2],biorxiv:0,biotyp:4,birger:0,block:4,block_substitut:[],bm_citat:4,bm_clinical_signific:4,bm_disease_nam:4,bm_drug_nam:4,bm_evidence_direct:4,bm_evidence_level:4,bm_evidence_typ:4,bm_rate:4,bool:[],boost:4,both:[0,4],braf:1,breast:[0,4],browser:4,build:4,bundl:[0,1,2],cadd:4,call:2,call_confid:[],caller:4,can:[0,2,4],cancer:4,cancer_census_germlin:4,cancer_census_somat:4,cancer_mutation_hotspot:4,cancer_typ:4,cancerhotspot:4,candid:4,canon:4,cap:4,caption:[],care:0,carri:4,catalog:[1,4],catalogu:1,categori:4,caus:4,causal:4,cbmdb:1,cbmdb_id:4,ccd:4,cdna:4,cdna_posit:4,cds:[],cds_chang:4,cds_end_nf:4,cds_posit:4,cds_start_nf:4,cell:4,cell_typ:4,cellular:[],cencu:1,censu:4,challeng:0,chang:[2,4],check:2,chr1:4,chr7:1,chr:[],chrom:4,chrome:4,chromosom:4,citat:[],cite:[0,4],civic:[1,4],civic_id:4,civic_id_2:4,classif:4,clin:[],clin_sig:4,clinic:[0,3],clinvar:[1,4],clinvar_msid:4,clinvar_pmid:4,clinvar_sig:4,clinvar_variant_origin:4,cluster:4,cna:2,cna_seg:[2,4],cnminor:[],cntotal:[],cnv:[],coad:2,code:3,codon:[1,4],codon_numb:4,cohort:1,coincid:4,collect:1,colorect:[0,2,4],column:4,com:[],come:[1,4],command:2,common:4,complet:[2,4],complex:0,composit:4,comprehens:0,compress:4,comput:2,concern:3,confer:1,confid:[],confirmed_somat:1,consensu:4,consequ:3,consid:4,consortium:[0,1,4],constitut:4,contact:3,contain:[0,2,4],content:3,context:0,contribut:[2,4],convent:4,coordin:4,copi:[0,2,3],correctli:4,correspond:1,cosmic:[1,4],cosmic_cancer_type_al:4,cosmic_cancer_type_gw:4,cosmic_codon_count_gw:4,cosmic_codon_frac_gw:4,cosmic_consequ:4,cosmic_count_gw:4,cosmic_drug_resist:4,cosmic_fathmm_pr:4,cosmic_mutation_id:4,cosmic_sample_sourc:4,cosmic_site_histolog:4,cosmic_vartyp:4,count:4,cover:4,cpu:2,creat:[],criteria:1,csq:4,curat:[1,4],current:4,damag:[],daniel:0,data:[0,2,3,4],databas:3,databundl:2,dataset:3,date:0,dbnsfp:[1,4],dbnsfp_consensus_lr:4,dbnsfp_consensus_svm:4,dbsnp:[1,4],dbsnp_mappingstatu:4,dbsnp_submiss:4,dbsnp_valid:4,dbsnpbuildid:4,dbsnprsid:4,dec:1,decompos:[],decomposit:4,decompress:2,deconstructsig:4,dedic:[],defin:4,delet:[2,4],delin:4,denot:4,depend:0,depth:4,deriv:4,describ:4,descript:4,determin:[],develop:[0,4],dgidb:1,diagnosi:2,diagnost:[0,4],differ:[2,4],direct:[],directli:4,directori:2,discov:1,diseas:4,distanc:4,distribut:4,dna:4,doc:[],docker:3,dockerhub:2,docm:1,docm_diseas:4,docm_pmid:4,document:[],doi:0,domain:[3,4],done:[],donor:4,download:[],downstream:2,dp_normal:[],dp_tumor:[],drive:2,driver:[1,4],drug:[1,2,4],dure:2,each:4,eas_af_1kg:4,eas_af_exac:4,eas_af_gnomad:4,east:4,effect:[0,3],effect_predict:4,either:[1,4],eivind:0,emerg:4,encourag:4,end:4,engin:2,ensembl:[0,4],ensembl_gene_id:4,ensembl_transcript_id:4,ensp:4,entrez:4,entrez_id:4,error:2,estim:2,etc:[1,4],etiolog:[2,4],eur:4,eur_af_1kg:4,european:4,event:4,evid:[2,4],exac:[1,4],exampl:[2,4],exist:[2,4],existing_vari:4,exit:2,exom:[1,4],exon:[1,4],experi:4,experienc:4,experiment:4,expert:0,explor:4,extend:0,facet:[],factor:4,fail:2,fall:4,fals:2,famili:1,fathmm:4,fathmm_mkl:4,fda:1,featur:[3,4],feature_typ:4,feb:[],februai:1,februari:1,figur:0,file:[2,4],fin:[],fin_af_exac:4,fin_af_gnomad:4,find:[0,4],finnish:4,firefox:4,first:4,flag:[2,4],flag_pick_allel:4,flank:4,flexibl:0,focu:[],folder:2,follow:[1,2,4],forc:2,force_overwrit:2,fork:2,form:0,format:0,found:[1,2,4],four:4,fournou:0,frac:[],fraction:[],fraction_mut:4,frameshift:4,frequenc:3,from:[0,1,2,4],g12:1,gain:4,gencod:[1,4],gencode_tag:4,gencode_transcript_typ:4,gencode_v19:4,gene:[0,2,3],gene_biotyp:4,gene_nam:4,gene_pheno:4,gene_symbol:4,gener:[0,3,4],genet:1,genindex:[],genom:4,genome_vers:4,genomic_chang:4,genotyp:4,germlin:1,gerp:4,get:3,getting_start:[],ghislain:0,given:4,global:4,global_af_1kg:4,global_af_exac:4,global_af_gnomad:4,gnomad:1,googl:[2,4],grch37:[1,2,4],great:0,guidelin:[1,4],gwa:4,gwas_catalog_pmid:4,gwas_catalog_trait_uri:4,gz_:[],gzip:2,handl:4,has:[0,2,4],have:[0,1,2,4],hdiv:4,help:2,here:4,hgnc:[],hgnc_id:4,hgv:[1,4],hgvs_offset:4,hgvsc:4,hgvsp:4,hgvsp_short:4,high:4,high_inf_po:4,higher:[],highlight:0,histolog:[1,4],hit:4,homozyg:2,hospit:0,host:2,hotspot:[1,4],hovig:0,how:4,howev:1,html:[0,2,3],http:4,human:[1,4],humdiv:[],hvar:4,icgc:[1,4],icgc_project:4,identifi:[1,2,4],ifi:0,iii:0,imag:[0,2],impact:4,implic:4,improv:4,includ:[1,4],incomplet:4,indel:[0,2,3],index:4,indic:4,individu:[0,4],inf:[],infer:4,inferenti:4,info:4,inform:1,initi:4,input:[2,3],input_cna_seg:2,input_vcf:2,insert:4,insilico:3,instal:0,institut:0,instruct:2,integr:0,intend:0,interact:[0,1,2,3],intern:1,interpret:[0,1,2],interrog:0,intersect:4,intogen:[1,4],intogen_driv:4,intogen_driver_mut:4,intro:[],intron:4,isoform:4,isol:0,item:[2,4],its:4,jan:4,june:[],kit:1,knowledg:[0,3],knowledgebas:1,known:[1,2,4],kra:1,lack:4,lar:0,larg:[],latest:2,least:[],length:4,level:[0,1,4],librari:0,lies:4,like:4,limit:1,line:[],link:4,linux:2,list:[],literatur:[1,4],log:[2,4],logist:4,logr:4,logr_threshold_amplif:2,logr_threshold_homozygous_delet:2,lost:4,low:4,lrt:4,mac:2,machin:[2,4],maf:2,mai:[1,4],make:[],malign:1,mani:4,map:3,mappabl:4,mappingstatu:[],march:1,marker:1,master:[],match:4,matter:4,maxdepth:[],mean:4,measur:4,memori:2,messag:2,met:[],minim:2,minimum:[],minor:[],missens:4,mix:4,mkdir:[],mkl:[],modifi:4,modindex:[],modul:[],most:[0,1,4],motif:4,motif_nam:4,motif_po:4,motif_score_chang:4,motiffeatur:4,mozilla:4,mrna:4,msid:[],multi:4,multipl:[],must:[1,2,4],mut:[],mutat:[0,1,2],mutational_signatur:[2,4],mutationassessor:4,mutationtast:4,mutect:[],mutpr:4,mutsigcv:2,nakken:0,name:4,navig:[],nccn:1,ncgc:[],need:0,nfe:[],nfe_af_exac:4,nfe_af_gnomad:4,nomenclatur:1,non:[1,4],none:2,normal:4,norwegian:0,notat:4,note:[3,4],nov:4,novemb:1,novo:4,now:2,nucleotid:[2,4],num:[],num_vcfanno_process:2,num_vep_fork:2,number:[0,2,3],numer:4,observ:4,obtain:[],oct:[],offset:[],oncogen:[1,4],oncolog:[0,2],oncologist:[],oncoscor:4,one:4,onli:[0,1,4],ontolog:4,option:[2,4],order:[],org:4,organ:[2,4],origin:4,oslo:0,osx:[],oth:[],oth_af_exac:4,oth_af_gnomad:4,other:[2,3],our:0,out:4,output:[2,3],output_dir:2,overlap:[2,4],overview:[],overwrit:2,packag:[0,4],page:[],pair:4,paper:0,pars:4,part:[1,4],particular:4,pass:4,pcgr:[1,3,4],pcgr_dir:2,pcgr_directori:[],pcgreport:[],percent:4,person:2,pfam:1,phase3:1,phase:4,pheno:4,phenotyp:4,phred:[],pick:4,pipelin:4,platform:2,pleas:0,pmid:4,point:4,polyp:1,polyphen2:4,portrai:4,pose:0,posit:[2,4],possibl:1,potenti:4,pre:4,precis:[0,2],pred:[],predict:[3,4],predictor:[0,1],predispos:4,predisposit:4,prefer:2,prefix:2,prerequisit:3,present:[0,4],primari:4,princip:4,prinicip:[],priorit:0,process:[2,4],produc:[0,2,4],product:4,profil:4,prognosi:2,prognost:[0,4],program:2,project:4,properli:[],proport:1,propos:4,proposed_aetiolog:4,prot:4,protein:3,protein_chang:4,protein_domain:4,protein_posit:4,provean:4,provid:4,pubm:4,pull:2,python:[],qualiti:[3,4],queri:[2,4],quickstart:[],ram:2,rang:4,rate:4,rather:4,ratio:[2,4],raw:[],recommend:4,record:4,ref:[],refer:[1,4],reflect:4,refseq:4,refseq_match:4,refut:4,regress:4,regulatori:4,regulatoryfeatur:4,rel:4,relat:[0,1],releas:[1,2,4],relev:[0,2,4],replac:2,report:1,reported_in_another_cancer_sample_as_somat:1,repres:2,represent:4,requir:[0,1,2,4],research:0,resist:[2,4],resourc:[0,2,3],respect:4,restart:2,result:[0,2,4],retriev:[0,4],revel:4,rich:2,robust:4,root:[],rsid:4,run:[3,4],run_pcgr:[],safari:4,sampl:[0,1,2,4],sample_id:[2,4],sample_pair_identifi:[],sampleid:4,sas_af_1kg:4,sas_af_exac:4,sas_af_gnomad:4,satisfi:1,scale:4,scarciti:0,scientif:1,scientist:0,score:4,screen:4,script:2,search:[],segment:2,segment_end:4,segment_length:4,segment_mean:4,segment_start:4,sensit:[2,4],sep:[],separ:2,sequenc:[1,2,4],set:[0,2,4],sever:0,shift:4,shortest:4,should:2,show:[2,4],sift:4,sig:[],sigantur:[],signatur:2,signature_id:4,signific:[1,4],sigv:0,sigven:[0,2],similarli:2,singl:4,site:[1,4],snv:[0,2,3],snvs_indel:[2,4],softwar:[0,2],somat:[0,1,2,3],sort:[],sourc:4,south:4,specif:4,sphinx:[],splice:4,splice_site_effect_ada:4,splice_site_effect_bool:[],splice_site_effect_rf:4,split:[],stabl:4,stand:0,standard:4,star:4,start:[3,4],statement:4,statist:1,statu:[1,4],step:[],stop:4,strand:4,strelka:[],strip:[],strive:4,strong:[],strongli:4,structur:4,studi:4,subject:4,submiss:4,submit:4,subset:1,substitut:[],subtyp:4,support:4,suppressor:[1,4],svm:[],swiss:4,swissprot:[1,4],symbol:4,symbol_sourc:4,synonym:[1,4],systemat:0,tab:2,tabix:4,tabl:3,tag:4,take:2,taken:0,tar:2,target:4,tcga:[0,2,4],technolog:3,termin:2,test:[3,4],test_sampl:[],tfbp:4,tgz:2,thei:1,therapeut:[0,4],therapi:4,therefor:4,thi:[1,2,4],those:4,through:[0,2,4],throughput:[],thu:0,tier:[0,2,4],tier_descript:4,toctre:[],todo:[],tool:[0,2],toolbar:2,total:4,trait:4,transcript:[2,4],transcript_end:4,transcript_overlap_perc:4,transcript_start:4,translat:0,transvar:1,treatment:4,trembl:4,treshold:2,trial:[1,4],trust:4,tsgene:[1,4],tsgene_oncogen:4,tsl:4,tsv:2,tumor:[0,1,2,4],tumor_sampl:2,tumor_suppressor:4,tumor_typ:4,tumorigenesi:4,two:[2,4],type:[2,4],uio:0,unambigu:1,unannot:4,underli:[2,4],uniform:4,uniparc:4,uniprot:[1,4],uniprot_featur:4,uniprot_id:4,uniprotkb:4,uniqu:4,univers:0,unix:2,unpack:2,untar:2,upcom:[],upon:[],upper:4,uri:4,usag:2,use:[2,3],used:[1,2],user:4,using:[0,2,4],util:3,v15:[],v19:[1,4],v22:[],v23:1,v30:[],v31:1,v600e:1,v78:[],v80:1,v85:1,valid:4,valu:2,variabl:4,variant:[0,2,3],variant_class:4,variat:4,variou:4,vartyp:[],vcf:2,vcf_sample_id:4,vcfanno:[0,2],vcfbreakmulti:[],vcflib:[],vcftool:[],vector:4,vep:[0,1,2],vep_all_consequ:4,veri:2,version:[2,4],view:4,virtual:2,weak:[],weak_mutect:[],weak_strelka:[],weight:4,what:3,whenev:1,where:4,whether:4,which:[0,1,2,4],why:3,wide:[1,4],window:2,within:[1,2,4],work:[],workflow:[0,2,4],working_directori:[],wtsi:4,xvf:2,you:[0,2],your:2,yyyymmdd:2},titles:["About","Annotation resources","Getting started","Welcome to Personal Cancer Genome Reporter’s documentation!","Input & output"],titleterms:{"function":1,abber:4,about:0,all:4,among:4,annot:[1,4],associ:4,base:[0,1],basic:1,biomark:4,both:[],call:4,cancer:[0,1,2,3],clinic:[1,2,4],code:[1,4],concern:1,consequ:[1,4],contact:0,copi:4,data:1,databas:[1,4],dataset:1,differ:[],dna:[],docker:[0,2],document:3,domain:1,download:2,drug:[],effect:[1,4],etc:[],exampl:0,featur:1,format:4,frequenc:[1,4],gene:[1,4],gener:2,genom:[0,1,2,3],germlin:4,get:2,hotspot:[],html:4,includ:[],indel:4,indic:[],inform:4,input:4,insilico:1,instal:2,interact:4,introduct:[],knowledg:1,list:4,map:1,marker:[],mutat:4,ncgc:[],note:1,number:4,oncovarexplor:[],other:[1,4],output:4,packag:[],pcgr:[0,2],person:[0,3],predict:1,preprocess:[],prerequisit:2,protein:[1,4],python:2,qualiti:1,report:[0,2,3,4],resourc:1,run:2,segment:4,sensit:[],separ:4,signatur:4,snv:4,somat:4,sourc:[],start:2,tab:4,tabl:[],technolog:0,test:2,tsv:4,tumor:[],type:[],use:0,util:1,valu:4,variant:[1,4],variat:[],vcf:4,vep:4,welcom:3,what:0,why:0}}) \ No newline at end of file diff --git a/docs/about.md b/docs/about.md index 63fe76ad..7d9c3036 100644 --- a/docs/about.md +++ b/docs/about.md @@ -2,19 +2,32 @@ ### What is the Personal Cancer Genome Reporter (PCGR)? -The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package intended for analysis and clinical interpretation of individual cancer genomes. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces HTML reports that can be navigated by clinical oncologists (Figure 1). +The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package for functional annotation and translation of individual cancer genomes for precision oncology. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces interactive HTML reports intended for clinical interpretation (Figure 1). ![](PCGR_workflow.png) The Personal Cancer Genome Reporter has been developed by scientists affiliated with the [Norwegian Cancer Genomics Consortium](http://cancergenomics.no), at the [Institute for Cancer Research/Oslo University Hospital](http://radium.no). +#### Example reports +* Report for a colorectal tumor sample (TCGA) +* Report for a breast tumor sample (TCGA) + + ### Why use PCGR? The great complexity of acquired mutations in individual tumor genomes poses a severe challenge for clinical interpretation. There is a general scarcity of tools that can _i)_ systematically interrogate cancer genomes in the context of diagnostic, prognostic, and therapeutic biomarkers, _ii)_ prioritize and highlight the most important findings, and _iii)_ present the results in a format accessible to clinical experts. PCGR integrates a comprehensive set of knowledge resources related to tumor biology and therapeutic biomarkers, both at the gene and variant level. The application generates a tiered report that will aid the interpretation of individual cancer genomes in a clinical setting. +If you use PCGR, please cite our paper: + +Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and Eivind Hovig. __Personal Cancer Genome Reporter: Variant Interpretation Report For Precision Oncology__ (2017). bioRxiv. doi:[10.1101/122366](https://doi.org/10.1101/122366) + ### Docker-based technology The PCGR workflow is developed using the [Docker technology](https://www.docker.com/what-docker). The software is thus packaged into isolated containers, in which the installation of all software libraries/tools and required dependencies have been taken care of. In addition to the bundled software, in the form of a Docker image, the workflow only needs to be attached with an [annotation data bundle for precision oncology](annotation_resources.html). ![](docker-logo50.png) + +### Contact + +sigven@ifi.uio.no diff --git a/docs/about.rst b/docs/about.rst index 9a1ab9eb..7f1232e7 100644 --- a/docs/about.rst +++ b/docs/about.rst @@ -5,14 +5,15 @@ What is the Personal Cancer Genome Reporter (PCGR)? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Personal Cancer Genome Reporter (PCGR) is a stand-alone software -package intended for analysis and clinical interpretation of individual -cancer genomes. It interprets both somatic SNVs/InDels and copy number -aberrations. The software extends basic gene and variant annotations -from the `Ensembl’s Variant Effect Predictor +package for functional annotation and translation of individual cancer +genomes for precision oncology. It interprets both somatic SNVs/InDels +and copy number aberrations. The software extends basic gene and variant +annotations from the `Ensembl’s Variant Effect Predictor (VEP) `__ with oncology-relevant, up-to-date annotations retrieved flexibly through -`vcfanno `__, and produces HTML -reports that can be navigated by clinical oncologists (Figure 1). +`vcfanno `__, and produces +interactive HTML reports intended for clinical interpretation (Figure +1). .. figure:: PCGR_workflow.png :alt: @@ -22,6 +23,12 @@ affiliated with the `Norwegian Cancer Genomics Consortium `__, at the `Institute for Cancer Research/Oslo University Hospital `__. +Example reports +^^^^^^^^^^^^^^^ + +- Report for a colorectal tumor sample (TCGA) +- Report for a breast tumor sample (TCGA) + Why use PCGR? ~~~~~~~~~~~~~ @@ -37,6 +44,13 @@ and variant level. The application generates a tiered report that will aid the interpretation of individual cancer genomes in a clinical setting. +If you use PCGR, please cite our paper: + +Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and +Eivind Hovig. **Personal Cancer Genome Reporter: Variant Interpretation +Report For Precision Oncology** (2017). bioRxiv. +doi:\ `10.1101/122366 `__ + Docker-based technology ~~~~~~~~~~~~~~~~~~~~~~~ @@ -50,3 +64,8 @@ for precision oncology `__. .. figure:: docker-logo50.png :alt: + +Contact +~~~~~~~ + +sigven@ifi.uio.no diff --git a/docs/getting_started.md b/docs/getting_started.md index 6906b861..d82c7f67 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -23,18 +23,18 @@ An installation of Python (version 2.7.13) is required to run PCGR. Check that P #### Download PCGR -__April 19th 2017__: New release (0.3.2) +__April 20th 2017__: New release (0.3.3) -* Download and unpack the [latest release (0.3.2)](https://github.com/sigven/pcgr/releases/latest) +* Download and unpack the [latest release (0.3.3)](https://github.com/sigven/pcgr/releases/latest) * Download and unpack the data bundle (approx. 17Gb) in the PCGR directory - * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g. `~/pcgr-0.3.2`) + * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g. `~/pcgr-0.3.3`) * Decompress and untar the bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -` A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced -* Pull the [PCGR Docker image - 0.3.2](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb) : - * `docker pull sigven/pcgr:0.3.2` (PCGR annotation engine) +* Pull the [PCGR Docker image - 0.3.3](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.2Gb) : + * `docker pull sigven/pcgr:0.3.3` (PCGR annotation engine) ### Run test - generation of clinical report for a cancer genome @@ -55,7 +55,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__, whi positional arguments: pcgr_dir PCGR base directory with accompanying data directory, - e.g. ~/pcgr-0.3.2 + e.g. ~/pcgr-0.3.3 output_dir Output directory sample_id Tumor sample/cancer genome identifier - prefix for output files @@ -90,7 +90,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__, whi The _examples_ folder contain input files from two tumor samples sequenced within TCGA. A report for a colorectal tumor case can be generated by running the following command in your terminal window: `python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments ` -`examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD` +`examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD` This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder: diff --git a/docs/getting_started.rst b/docs/getting_started.rst index cce4c2d5..762972a8 100644 --- a/docs/getting_started.rst +++ b/docs/getting_started.rst @@ -42,10 +42,10 @@ terminal window. Download PCGR ^^^^^^^^^^^^^ -**April 19th 2017**: New release (0.3.2) +**April 20th 2017**: New release (0.3.3) - Download and unpack the `latest release - (0.3.2) `__ + (0.3.3) `__ - Download and unpack the data bundle (approx. 17Gb) in the PCGR directory @@ -53,7 +53,7 @@ Download PCGR - Download `the latest data bundle `__ from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the - version number, e.g. ``~/pcgr-0.3.2``) + version number, e.g. ``~/pcgr-0.3.3``) - Decompress and untar the bundle, e.g. through the following Unix command: ``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`` @@ -62,10 +62,10 @@ Download PCGR have been produced - Pull the `PCGR Docker image - - 0.3.2 `__ from DockerHub - (3.1Gb) : + 0.3.3 `__ from DockerHub + (3.2Gb) : - - ``docker pull sigven/pcgr:0.3.2`` (PCGR annotation engine) + - ``docker pull sigven/pcgr:0.3.3`` (PCGR annotation engine) Run test - generation of clinical report for a cancer genome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -89,7 +89,7 @@ A tumor sample report is generated by calling the Python script positional arguments: pcgr_dir PCGR base directory with accompanying data directory, - e.g. ~/pcgr-0.3.2 + e.g. ~/pcgr-0.3.3 output_dir Output directory sample_id Tumor sample/cancer genome identifier - prefix for output files @@ -125,7 +125,7 @@ sequenced within TCGA. A report for a colorectal tumor case can be generated by running the following command in your terminal window: ``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments`` -``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD`` +``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD`` This command will run the Docker-based PCGR workflow and produce the following output files in the *examples* folder: diff --git a/docs/output.md b/docs/output.md index 02caeb1f..1dbcf1c5 100644 --- a/docs/output.md +++ b/docs/output.md @@ -13,12 +13,8 @@ __IMPORTANT NOTE__: Only the GRCh37 version of the human genome is currently sup #### VCF -The following requirements __MUST__ be met by the input VCF for PCGR to work properly: - -1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. This can be done with the help of either [vt decompose](http://genome.sph.umich.edu/wiki/Vt#Decompose) or [vcflib's vcfbreakmulti](https://github.com/vcflib/vcflib#vcflib). We will add integrated support for this in an upcoming release -2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by [vcftools](https://vcftools.github.io/perl_module.html#vcf-sort). - * We __strongly__ recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html) - * 'chr' must be stripped from the chromosome names +* We __strongly__ recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html) +* If the input VCF contains multi-allelic sites, these will be subject to [decomposition](http://genome.sph.umich.edu/wiki/Vt#Decompose) __IMPORTANT NOTE 1__: Considering the VCF output for the [numerous somatic SNV/InDel callers](https://www.biostars.org/p/19104/) that have been developed, we have a experienced a general lack of uniformity and robustness for the representation of somatic variant genotype data (e.g. variant allelic depths (tumor/normal), genotype quality etc.). In the output results provided within the current version of PCGR, we are considering PASSed variants only, and variant genotype data (i.e. as found in the VCF SAMPLE columns) are not handled or parsed. As improved standards for this matter may emerge, we will strive to include this information in the annotated output files. diff --git a/docs/output.rst b/docs/output.rst index e59a1c9d..5ad23bcd 100644 --- a/docs/output.rst +++ b/docs/output.rst @@ -18,23 +18,11 @@ currently supported. VCF ^^^ -The following requirements **MUST** be met by the input VCF for PCGR to -work properly: - -1. Variants in the raw VCF that contain multiple alternative alleles - (e.g. "multiple ALTs") must be split into variants with a single - alternative allele. This can be done with the help of either `vt - decompose `__ or - `vcflib's vcfbreakmulti `__. - We will add integrated support for this in an upcoming release -2. The contents of the VCF must be sorted correctly (i.e. according to - chromosomal order and chromosomal position). This can be obtained by - `vcftools `__. - - - We **strongly** recommend that the input VCF is compressed and - indexed using `bgzip `__ and - `tabix `__ - - 'chr' must be stripped from the chromosome names +- We **strongly** recommend that the input VCF is compressed and + indexed using `bgzip `__ and + `tabix `__ +- If the input VCF contains multi-allelic sites, these will be subject + to `decomposition `__ **IMPORTANT NOTE 1**: Considering the VCF output for the `numerous somatic SNV/InDel callers `__ that diff --git a/pcgr.py b/pcgr.py index 5b594dfe..df032982 100755 --- a/pcgr.py +++ b/pcgr.py @@ -8,7 +8,7 @@ import logging import sys -version = '0.3.2' +version = '0.3.3' def __main__(): @@ -32,6 +32,7 @@ def __main__(): if args.force_overwrite is True: overwrite = 1 + ## check that script and Docker image version correspond check_docker_command = 'docker images -q ' + str(docker_image_version) output = subprocess.check_output(str(check_docker_command), stderr=subprocess.STDOUT, shell=True) if(len(output) == 0): @@ -47,6 +48,9 @@ def __main__(): def check_input_files(input_vcf, input_cna_segments, base_pcgr_dir, output_dir, overwrite, sample_id, logger): + """ + Function that checks the input files and directories provided by the user and checks for their existence + """ input_vcf_dir = "NA" input_cna_dir = "NA" @@ -176,6 +180,9 @@ def getlogger(logger_name): return logger def run_pcgr(host_directories, docker_image_version, logR_threshold_amplification, logR_threshold_homozygous_deletion, num_vcfanno_processes, num_vep_forks, sample_id): + """ + Main function to run the PCGR workflow using Docker + """ ## set basic Docker run commands output_vcf = 'None' diff --git a/src/Dockerfile b/src/Dockerfile index abcf74e5..8d8bf67d 100755 --- a/src/Dockerfile +++ b/src/Dockerfile @@ -1,18 +1,21 @@ ############################################################ # Dockerfile to build Personal Cancer Genome Reporter (PCGR) # Main software components: -# 1. Variant Effect Predictor (VEP) +# 1. Variant Effect Predictor (VEP 85) # 2. vcfanno (0.0.11) -# 3. custom scripts (pcgr.tgz) +# 3. R (3.3.3) and R packages +# 3. Custom Python scripts (pcgr.tgz) and R package (pcgrr2) ############################################################ -# use Debian:testing as base image +## use Debian:testing as base image FROM debian:jessie-backports -# set non interactive +## set non interactive ENV DEBIAN_FRONTEND=noninteractive + +## define packages to be installed ENV PACKAGE_BIO="tabix samtools libhts1 bedtools" -ENV PACKAGE_DEV="perl debconf-utils build-essential gfortran python-dev python-pip gcc-multilib autoconf zlib1g-dev git libncurses5-dev libblas-dev liblapack-dev cpanminus libcurl4-gnutls-dev libssh2-1-dev libxml2-dev vim libssl-dev openssl libcairo2-dev" +ENV PACKAGE_DEV="perl debconf-utils build-essential gfortran python-dev python-pip gcc-multilib autoconf zlib1g-dev liblzma-dev git libncurses5-dev libblas-dev liblapack-dev cpanminus libcurl4-gnutls-dev libssh2-1-dev libxml2-dev vim libssl-dev openssl libcairo2-dev libbz2-dev" ENV PYTHON_MODULES="numpy cython scipy transvar bx-python pyvcf cyvcf cyvcf2 biopython crossmap pandas" ENV R_BASE_VERSION 3.3.3 @@ -33,14 +36,11 @@ RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \ ENV LC_ALL en_US.UTF-8 ENV LANG en_US.UTF-8 -#deb http://ftp.debian.org/debian jessie-backports main RUN echo "deb http://http.debian.net/debian jessie-backports main" > /etc/apt/sources.list.d/debian-jessie-backports.list \ && echo 'APT::Default-Release "jessie-backports";' > /etc/apt/apt.conf.d/default -#RUN echo "deb http://http.debian.net/debian sid main" > /etc/apt/sources.list.d/debian-unstable.list \ - #&& echo 'APT::Default-Release "testing";' > /etc/apt/apt.conf.d/default -## Now install R and littler, and create a link for littler in /usr/local/bin -## Also set a default CRAN repo, and make sure littler knows about it too + +## Install R RUN apt-get update && apt-get install -y --no-install-recommends \ littler \ #r-cran-littler \ @@ -57,23 +57,23 @@ RUN apt-get update && apt-get install -y --no-install-recommends \ && rm -rf /tmp/downloaded_packages/ /tmp/*.rds \ && rm -rf /var/lib/apt/lists/* -# Install pandoc (for HTML reports) +## Install pandoc (for HTML report generation) RUN wget https://github.com/jgm/pandoc/releases/download/1.19.1/pandoc-1.19.1-1-amd64.deb && \ dpkg -i pandoc* && \ rm pandoc* && \ apt-get clean -# Install necessary R packages +## Install necessary R packages RUN R -e "install.packages(c('dplyr','stringr','tidyr','ggplot2','httr','git2r','data.table','magrittr','devtools','DT'), dependencies = T, repos = 'http://cran.us.r-project.org')" RUN R -e "source(\"https://bioconductor.org/biocLite.R\"); biocLite(c('deconstructSigs', 'KEGGREST','VariantAnnotation','BSgenome.Hsapiens.UCSC.hg19','GenomeInfoDb','GenomicRanges','S4Vectors'))" RUN R -e "library(devtools); devtools::install_github('mjkallen/rlogging')" -# Install tools used for compilation +## Install tools used for compilation RUN pip install -U setuptools RUN pip install $PYTHON_MODULES -# Install vcfanno version 0.0.11 +## Install vcfanno version 0.0.11 RUN wget https://github.com/brentp/vcfanno/releases/download/v0.0.11/vcfanno_0.0.11_linux_amd64.tar.gz && \ tar xvzf vcfanno_0.0.11_linux_amd64.tar.gz && \ mv vcfanno_0.0.11_linux_amd64/vcfanno /usr/local/bin && \ @@ -81,13 +81,12 @@ RUN wget https://github.com/brentp/vcfanno/releases/download/v0.0.11/vcfanno_0.0 rm -rf vcfanno_0.0.11_linux_amd64 -# Install Ensembl's Vcf-validator +## Install Ensembl's Vcf-validator RUN wget https://github.com/EBIvariation/vcf-validator/releases/download/v0.4.2/vcf_validator && \ mv vcf_validator /usr/local/bin && \ chmod 755 /usr/local/bin/vcf_validator -# Install VEP -# Necessary Perl modules +# Install VEP's required Perl modules RUN cpanm File::ShareDir::Install \ && cpanm Data::UUID \ && cpanm autodie \ @@ -102,8 +101,8 @@ RUN cpanm File::ShareDir::Install \ && cpanm CGI \ && cpanm DBI \ && cpanm Archive::Tar -# && cpanm Bio::DB::HTS \ +## Set up VEP's cache/data folders VOLUME /usr/local/share/vep/data ENV VEP_DATA="/usr/local/share/vep/data" ENV VEP_DATA_DOCKER="/usr/local/share/vep/data" @@ -115,6 +114,8 @@ ENV PATH=$PATH:$VEP_PATH/htslib ENV SPECIES="homo_sapiens" ENV ASSEMBLY="GRCh37" ENV VEPPLUGIN="LoF,TSSDistance" + +## Clone and install VEP/85 RUN git clone -b release/85 https://github.com/Ensembl/ensembl-tools.git WORKDIR $VEP_INSTDIR RUN printf 'y\n' | perl INSTALL.pl --AUTO a --SPECIES $SPECIES --ASSEMBLY $ASSEMBLY --PLUGINS $VEPPLUGIN --DESTDIR $VEP_PATH --CACHEDIR $VEP_DATA @@ -128,18 +129,26 @@ RUN make RUN cp bin/vcfbreakmulti /usr/local/bin RUN rm -rf /vcflib -# Add local PCGR R package +## Add local PCGR R package WORKDIR / ADD R/ / RUN R -e "devtools::install('pcgrr2')" -# Add local PCGR Python scripts/libraries +## Add vt (for decomposition of multi-allelic sites in query VCF) +RUN git clone https://github.com/atks/vt.git +WORKDIR vt +RUN make +RUN make test +RUN cp vt /usr/local/bin + + +## Add local PCGR Python scripts/libraries ADD pcgr.tgz / ENV PATH=$PATH:/pcgr ENV PYTHONPATH=:/pcgr/lib:${PYTHONPATH} ENV VCFANNO_DATA_DOCKER="/data" -# Clean Up +## Clean Up RUN apt-get clean autoclean RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* RUN rm -rf /var/lib/{dpkg,cache,log} diff --git a/src/R/pcgrr2/.Rhistory b/src/R/pcgrr2/.Rhistory index b1190f86..c3c4f4c7 100644 --- a/src/R/pcgrr2/.Rhistory +++ b/src/R/pcgrr2/.Rhistory @@ -1,10 +1,3 @@ -cna_segments <- dplyr::rename(cna_segments, SEGMENT_LENGTH = segment_length, SEGMENT = segment_link) -cna_segments <- dplyr::select(cna_segments, SEGMENT, SEGMENT_LENGTH, LogR) %>% dplyr::distinct() -cna_segments_filtered <- dplyr::filter(segments, LogR >= logR_threshold_amplification | LogR <= logR_threshold_homozygous_deletion) -cna_segments_filtered <- cna_segments_filtered %>% dplyr::arrange(desc(LogR)) -rlogging::message(paste0("Detected ",nrow(cna_segments_filtered)," segments subject to amplification/deletion")) -cna_segments_filtered <- dplyr::filter(cna_segments, LogR >= logR_threshold_amplification | LogR <= logR_threshold_homozygous_deletion) -cna_segments_filtered <- cna_segments_filtered %>% dplyr::arrange(desc(LogR)) rlogging::message(paste0("Detected ",nrow(cna_segments_filtered)," segments subject to amplification/deletion")) getwd() project_directory <- '/Users/sigven/research/docker/pcgr/examples' @@ -510,3 +503,10 @@ generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf cna_biomarkers <- data.frame() nrow<(cna_biomarkers) nrow(cna_biomarkers) +vcf_gz_file <- '~/Downloads/test_jianming.pcgr.vcf.gz' +vcf_data_vr <- VariantAnnotation::readVcfAsVRanges(vcf_gz_file,genome = "hg19") +vcf_data_vr <- vcf_data_vr[VariantAnnotation::called(vcf_data_vr)] +vcf_data_vr <- pcgrr2::postprocess_vranges_info(vcf_data_vr) +vcf_data_df <- as.data.frame(vcf_data_vr) +vcf_data_df +library(deconstructSigs) diff --git a/src/pcgr.tgz b/src/pcgr.tgz index 6b96cce1..5eb9d4e8 100644 Binary files a/src/pcgr.tgz and b/src/pcgr.tgz differ diff --git a/src/pcgr/lib/pcgrutils.py b/src/pcgr/lib/pcgrutils.py index 988d48d4..7b34eeb3 100755 --- a/src/pcgr/lib/pcgrutils.py +++ b/src/pcgr/lib/pcgrutils.py @@ -166,6 +166,9 @@ def index_uniprot_features(uniprot_feature_fname): return uniprot_features def get_uniprot_data_by_transcript(up_xref, transcript_id, csq): + """ + Function that retrieves UniProt annotation (uniprot id and protein sequence match) for a given transcript + """ uniprot_mappings = None uniprot_ids = {} @@ -254,33 +257,6 @@ def index_uniprot_feature_names(sp_features_fname): return swissprot_features -def index_clinvar(clinvar_tsv_fname): - clinvar_xref = {} - with gzip.open(clinvar_tsv_fname, 'rb') as tsvfile: - cv_reader = csv.DictReader(tsvfile, delimiter='\t') - for rec in cv_reader: - - unique_traits = {} - traits = '' - traits = rec['all_traits'] - for t in traits.split(';'): - t_lc = str(t).lower() - unique_traits[t_lc] = 1 - origin = '' - origin = rec['origin'] - - traits_curated = ';'.join(unique_traits.keys()) - traits_origin = traits_curated + ' - ' + str(origin) - - clinvar_xref[rec['measureset_id']] = {} - clinvar_xref[rec['measureset_id']]['phenotype_origin'] = traits_origin - if rec['symbol'] == '-' or rec['symbol'] == 'more than 10': - rec['symbol'] = 'NA' - clinvar_xref[rec['measureset_id']]['genesymbol'] = rec['symbol'] - - return clinvar_xref - - def getlogger(logger_name): logger = logging.getLogger(logger_name) logger.setLevel(logging.DEBUG) diff --git a/src/pcgr/lib/utils.py b/src/pcgr/lib/utils.py deleted file mode 100755 index 724a2ae6..00000000 --- a/src/pcgr/lib/utils.py +++ /dev/null @@ -1,48 +0,0 @@ -#!/usr/bin/env python - -import os,re,sys -import csv - -csv.field_size_limit(500 * 1024 * 1024) - -def read_infotag_file(vcf_info_tags_tsv): - info_tag_xref = {} - if not os.path.exists(vcf_info_tags_tsv): - return info_tag_xref - with open(vcf_info_tags_tsv, 'rb') as tsvfile: - vep_reader = csv.DictReader(tsvfile, delimiter='\t') - for rec in vep_reader: - if not info_tag_xref.has_key(rec['tag']): - info_tag_xref[rec['tag']] = rec - - return info_tag_xref - -def index_cancer_hotspots(cancer_hotspot_path): - hotspot_xref = {} - if not os.path.exists(cancer_hotspot_path): - return hotspot_xref - with open(cancer_hotspot_path, 'rb') as tsvfile: - ch_reader = csv.DictReader(tsvfile, delimiter='\t', quotechar='#') - for rec in ch_reader: - if 'splice' in rec['Codon']: - continue - gene = str(rec['Hugo Symbol']).upper() - codon = str(re.sub(r'[A-Z]','',rec['Codon'])) - if not hotspot_xref.has_key(gene): - hotspot_xref[gene] = {} - hotspot_xref[gene][codon] = rec - return hotspot_xref - -def map_cancer_hotspots(cancer_hotspot_xref, vep_info_tags, protein_info_tags): - - for alt_allele in vep_info_tags['Feature'].keys(): - hotspot_hits = {} - symbol = vep_info_tags['SYMBOL'][alt_allele] - consequence = vep_info_tags['Consequence'][alt_allele] - if protein_info_tags['PROTEIN_POSITIONS'].has_key(alt_allele): - if 'missense_variant' in consequence or 'stop_gained' in consequence: - if cancer_hotspot_xref.has_key(symbol): - for codon in protein_info_tags['PROTEIN_POSITIONS'][alt_allele].keys(): - if cancer_hotspot_xref[symbol].has_key(str(codon)): - cancer_hotspot_description = str(symbol) + '|' + str(cancer_hotspot_xref[symbol][str(codon)]['Codon']) + '|' + str(cancer_hotspot_xref[symbol][str(codon)]['Q-value']) - protein_info_tags['CANCER_MUTATION_HOTSPOT'][alt_allele] = cancer_hotspot_description diff --git a/src/pcgr/pcgr_check_input.py b/src/pcgr/pcgr_check_input.py index 8502daa3..2cf06b2c 100755 --- a/src/pcgr/pcgr_check_input.py +++ b/src/pcgr/pcgr_check_input.py @@ -122,9 +122,8 @@ def check_existing_vcf_info_tags(input_vcf, pcgr_directory, logger): ret = 1 for k in vcf_reader.infos.keys(): if k in vep_infotags_desc.keys() or k in pcgr_infotags_desc.keys() or k in vcfanno_tags.keys() or k == 'EFFECT_PREDICTIONS': - if k != 'STRAND': - logger.error('INFO tag ' + str(k) + ' in the query VCF coincides with a VCF annotation tag produced by PCGR - please remove or rename this tag in your query VCF') - ret = -1 + logger.error('INFO tag ' + str(k) + ' in the query VCF coincides with a VCF annotation tag produced by PCGR - please remove or rename this tag in your query VCF') + ret = -1 return ret def verify_pcgr_input(pcgr_directory, input_vcf, input_cna_segments): @@ -132,13 +131,13 @@ def verify_pcgr_input(pcgr_directory, input_vcf, input_cna_segments): Function that reads the input files to PCGR (VCF file and Tab-separated values file with copy number segments) and performs the following checks: 1. Check that VCF file is properly formatted (according to EBIvariation/vcf-validator - VCF v4.2) 2. Check that no INFO annotation tags in the query VCF coincides with those generated by PCGR - 3. Check that 'chr' is stripped from CHROM column in VCF file - 4. Check that no variants have multiple alternative alleles (e.g. 'A,T') - 5. Check that copy number segment file has required columns and correct data types (and range) - 6. Any genotype data from VCF input file is stripped, and the resulting VCF file is sorted and indexed (bgzip + tabix) + 3. Check that if VCF have variants with multiple alternative alleles (e.g. 'A,T') run vt decompose + 4. Check that copy number segment file has required columns and correct data types (and range) + 5. Any genotype data from VCF input file is stripped, and the resulting VCF file is sorted and indexed (bgzip + tabix) """ logger = pcgrutils.getlogger('pcgr-check-input') - input_vcf_pcgr_ready = '/workdir/output/' + re.sub(r'(\.vcf$|\.vcf\.gz$)','.pcgr_ready.vcf',os.path.basename(input_vcf)) + input_vcf_pcgr_ready = '/workdir/output/' + re.sub(r'(\.vcf$|\.vcf\.gz$)','.pcgr_ready.tmp.vcf',os.path.basename(input_vcf)) + input_vcf_pcgr_ready_decomposed = '/workdir/output/' + re.sub(r'(\.vcf$|\.vcf\.gz$)','.pcgr_ready.vcf',os.path.basename(input_vcf)) if not input_vcf == 'None': logger.info('Validating VCF file with EBIvariation/vcf-validator') @@ -169,35 +168,34 @@ def verify_pcgr_input(pcgr_directory, input_vcf, input_cna_segments): multiallelic_alt = 0 vcf = VCF(input_vcf) for rec in vcf: - chrom = rec.CHROM - if chrom.startswith('chr'): - error_message_chrom = "'chr' must be stripped from chromosome names: " + str(rec.CHROM + ", see http://pcgr.readthedocs.io/en/latest/output.html#vcf-preprocessing") - logger.error(error_message_chrom) - return -1 POS = rec.start + 1 alt = ",".join(str(n) for n in rec.ALT) if len(rec.ALT) > 1: - logger.error('') - logger.error("Multiallelic site detected:" + str(rec.CHROM) + '\t' + str(POS) + '\t' + str(rec.REF) + '\t' + str(alt)) - logger.error('Alternative alleles must be decomposed, see http://pcgr.readthedocs.io/en/latest/output.html#vcf-preprocessing') - logger.error('') + logger.warning("Multiallelic site detected:" + str(rec.CHROM) + '\t' + str(POS) + '\t' + str(rec.REF) + '\t' + str(alt)) multiallelic_alt = 1 - return -1 command_vcf_sample_free1 = 'egrep \'^##\' ' + str(input_vcf) + ' > ' + str(input_vcf_pcgr_ready) command_vcf_sample_free2 = 'egrep \'^#CHROM\' ' + str(input_vcf) + ' | cut -f1-8 >> ' + str(input_vcf_pcgr_ready) - command_vcf_sample_free3 = 'egrep -v \'^#\' ' + str(input_vcf) + ' | cut -f1-8 | egrep -v \'^[XYM]\' | sort -k1,1n -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) - command_vcf_sample_free4 = 'egrep -v \'^#\' ' + str(input_vcf) + ' | cut -f1-8 | egrep \'^[XYM]\' | sort -k1,1 -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free3 = 'egrep -v \'^#\' ' + str(input_vcf) + ' | sed \'s/^chr//\' | cut -f1-8 | egrep -v \'^[XYM]\' | sort -k1,1n -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free4 = 'egrep -v \'^#\' ' + str(input_vcf) + ' | sed \'s/^chr//\' | cut -f1-8 | egrep \'^[XYM]\' | sort -k1,1 -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) if input_vcf.endswith('.gz'): command_vcf_sample_free1 = 'bgzip -dc ' + str(input_vcf) + ' | egrep \'^##\' > ' + str(input_vcf_pcgr_ready) command_vcf_sample_free2 = 'bgzip -dc ' + str(input_vcf) + ' | egrep \'^#CHROM\' | cut -f1-8 >> ' + str(input_vcf_pcgr_ready) - command_vcf_sample_free3 = 'bgzip -dc ' + str(input_vcf) + ' | egrep -v \'^#\' | cut -f1-8 | egrep -v \'^[XYM]\' | sort -k1,1n -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) - command_vcf_sample_free4 = 'bgzip -dc ' + str(input_vcf) + ' | egrep -v \'^#\' | cut -f1-8 | egrep \'^[XYM]\' | sort -k1,1 -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free3 = 'bgzip -dc ' + str(input_vcf) + ' | egrep -v \'^#\' | sed \'s/^chr//\' | cut -f1-8 | egrep -v \'^[XYM]\' | sort -k1,1n -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) + command_vcf_sample_free4 = 'bgzip -dc ' + str(input_vcf) + ' | egrep -v \'^#\' | sed \'s/^chr//\' | cut -f1-8 | egrep \'^[XYM]\' | sort -k1,1 -k2,2n -k3,3 -k4,4 >> ' + str(input_vcf_pcgr_ready) os.system(command_vcf_sample_free1) os.system(command_vcf_sample_free2) os.system(command_vcf_sample_free3) os.system(command_vcf_sample_free4) - os.system('bgzip -f ' + str(input_vcf_pcgr_ready)) - os.system('tabix -p vcf ' + str(input_vcf_pcgr_ready) + '.gz') + if multiallelic_alt == 1: + logger.info('Decomposing multi-allelic sites in input VCF file using \'vt decompose\'') + command_decompose = 'vt decompose -s ' + str(input_vcf_pcgr_ready) + ' > ' + str(input_vcf_pcgr_ready_decomposed) + ' 2> /workdir/output/decompose.log' + os.system(command_decompose) + else: + command_copy = 'cp ' + str(input_vcf_pcgr_ready) + ' ' + str(input_vcf_pcgr_ready_decomposed) + os.system(command_copy) + os.system('bgzip -f ' + str(input_vcf_pcgr_ready_decomposed)) + os.system('tabix -p vcf ' + str(input_vcf_pcgr_ready_decomposed) + '.gz') + os.system('rm -f ' + str(input_vcf_pcgr_ready) + ' /workdir/output/decompose.log') if not input_cna_segments == 'None': ret = is_valid_cna_segment_file(input_cna_segments, logger)