Skip to content

Commit

Permalink
automatic decomposition of multiallelics
Browse files Browse the repository at this point in the history
  • Loading branch information
sigven committed Apr 20, 2017
1 parent 8bc41f1 commit f35d825
Show file tree
Hide file tree
Showing 26 changed files with 230 additions and 251 deletions.
36 changes: 20 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,23 @@

### Overview

The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package intended for analysis and clinical interpretation of individual cancer genomes. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces HTML reports that can be navigated by clinical oncologists (Figure 1).
The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package for functional annotation and translation of individual cancer genomes for precision oncology. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html) with oncology-relevant, up-to-date annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno), and produces interactive HTML reports intended for clinical interpretation (Figure 1).

![PCGR overview](PCGR_workflow.png)

### Example reports
* <a href="http://folk.uio.no/sigven/tumor_sample.COAD.pcgr.html" target="_blank">View an example report for a colorectal tumor sample (TCGA)</a>
* <a href="http://folk.uio.no/sigven/tumor_sample.BRCA.pcgr.html" target="_blank">View an example report for a breast tumor sample (TCGA)</a>
* <a href="http://folk.uio.no/sigven/tumor_sample.COAD.pcgr.html" target="_blank">Report for a colorectal tumor sample (TCGA)</a>
* <a href="http://folk.uio.no/sigven/tumor_sample.BRCA.pcgr.html" target="_blank">Report for a breast tumor sample (TCGA)</a>

### PCGR documentation

[![Documentation Status](https://readthedocs.org/projects/pcgr/badge/?version=latest)](http://pcgr.readthedocs.io/en/latest/?badge=latest)

If you use PCGR, please cite our paper:

### Annotation resources included in PCGR (v0.3.2)
Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and Eivind Hovig. __Personal Cancer Genome Reporter: Variant Interpretation Report For Precision Oncology__ (2017). bioRxiv. doi:[10.1101/122366](https://doi.org/10.1101/122366)

### Annotation resources included in PCGR (v0.3.3)

* [VEP v85](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 85 (GENCODE v19 as the gene reference dataset)
* [COSMIC v80](http://cancer.sanger.ac.uk/cosmic/) - Catalogue of somatic mutations in cancer (February 2017)
Expand Down Expand Up @@ -53,16 +56,16 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/

#### STEP 2: Download PCGR

<font color="red"><b>April 19th 2017</b>: New release (0.3.2)</font>
<font color="red"><b>April 20th 2017</b>: New release (0.3.3)</font>

1. Download and unpack the [latest release (0.3.2)](https://github.com/sigven/pcgr/releases/latest)
1. Download and unpack the [latest release (0.3.3)](https://github.com/sigven/pcgr/releases/latest)
2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.2`)
* Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.3`)
* Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`

A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
3. Pull the [PCGR Docker image (0.3.2)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
* `docker pull sigven/pcgr:0.3.2` (PCGR annotation engine)
3. Pull the [PCGR Docker image (0.3.3)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.2Gb):
* `docker pull sigven/pcgr:0.3.3` (PCGR annotation engine)

#### STEP 3: Input preprocessing

Expand All @@ -73,12 +76,9 @@ The PCGR workflow accepts two types of input files:

PCGR can be run with either or both of the two input files present.

The following requirements __MUST__ be met by the input VCF for PCGR to work properly:
* We __strongly__ recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html)
* If the input VCF contains multi-allelic sites, these will be subject to [decomposition](http://genome.sph.umich.edu/wiki/Vt#Decompose)

1. Variants in the raw VCF that contain multiple alternative alleles (e.g. "multiple ALTs") must be split into variants with a single alternative allele. This can be done with the help of either [vt decompose](http://genome.sph.umich.edu/wiki/Vt#Decompose) or [vcflib's vcfbreakmulti](https://github.com/vcflib/vcflib#vcflib). We will add integrated support for this in an upcoming release
2. The contents of the VCF must be sorted correctly (i.e. according to chromosomal order and chromosomal position). This can be obtained by [vcftools](https://vcftools.github.io/perl_module.html#vcf-sort).
* We strongly recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html)
* 'chr' must be stripped from the chromosome names

The tab-separated values file with copy number aberrations __MUST__ contain the following four columns:
* Chromosome
Expand Down Expand Up @@ -112,7 +112,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t

positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
e.g. ~/pcgr-0.3.2
e.g. ~/pcgr-0.3.3
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
Expand Down Expand Up @@ -146,7 +146,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t

The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command:

`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD`
`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD`

This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder:

Expand All @@ -157,3 +157,7 @@ This command will run the Docker-based PCGR workflow and produce the following o
5. __tumor_sample.COAD.pcgr.mutational_signatures.tsv__ - Tab-separated values file with estimated contributions by known mutational signatures and associated underlying etiologies
6. __tumor_sample.COAD.pcgr.snvs_indels.biomarkers.tsv__ - Tab-separated values file with clinical evidence items associated with biomarkers for diagnosis, prognosis or drug sensitivity/resistance
7. __tumor_sample.COAD.pcgr.cna_segments.tsv.gz__ - Tab-separated values file with annotations of gene transcripts that overlap with somatic copy number aberrations

## Contact

[email protected]
Binary file modified docs/_build/doctrees/about.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/getting_started.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/output.doctree
Binary file not shown.
31 changes: 25 additions & 6 deletions docs/_build/html/_sources/about.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,15 @@ What is the Personal Cancer Genome Reporter (PCGR)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Personal Cancer Genome Reporter (PCGR) is a stand-alone software
package intended for analysis and clinical interpretation of individual
cancer genomes. It interprets both somatic SNVs/InDels and copy number
aberrations. The software extends basic gene and variant annotations
from the `Ensembl’s Variant Effect Predictor
package for functional annotation and translation of individual cancer
genomes for precision oncology. It interprets both somatic SNVs/InDels
and copy number aberrations. The software extends basic gene and variant
annotations from the `Ensembl’s Variant Effect Predictor
(VEP) <http://www.ensembl.org/info/docs/tools/vep/index.html>`__ with
oncology-relevant, up-to-date annotations retrieved flexibly through
`vcfanno <https://github.com/brentp/vcfanno>`__, and produces HTML
reports that can be navigated by clinical oncologists (Figure 1).
`vcfanno <https://github.com/brentp/vcfanno>`__, and produces
interactive HTML reports intended for clinical interpretation (Figure
1).

.. figure:: PCGR_workflow.png
:alt:
Expand All @@ -22,6 +23,12 @@ affiliated with the `Norwegian Cancer Genomics
Consortium <http://cancergenomics.no>`__, at the `Institute for Cancer
Research/Oslo University Hospital <http://radium.no>`__.

Example reports
^^^^^^^^^^^^^^^

- Report for a colorectal tumor sample (TCGA)
- Report for a breast tumor sample (TCGA)

Why use PCGR?
~~~~~~~~~~~~~

Expand All @@ -37,6 +44,13 @@ and variant level. The application generates a tiered report that will
aid the interpretation of individual cancer genomes in a clinical
setting.

If you use PCGR, please cite our paper:

Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and
Eivind Hovig. **Personal Cancer Genome Reporter: Variant Interpretation
Report For Precision Oncology** (2017). bioRxiv.
doi:\ `10.1101/122366 <https://doi.org/10.1101/122366>`__

Docker-based technology
~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -50,3 +64,8 @@ for precision oncology <annotation_resources.html>`__.

.. figure:: docker-logo50.png
:alt:

Contact
~~~~~~~

[email protected]
16 changes: 8 additions & 8 deletions docs/_build/html/_sources/getting_started.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,18 +42,18 @@ terminal window.
Download PCGR
^^^^^^^^^^^^^

**April 19th 2017**: New release (0.3.2)
**April 20th 2017**: New release (0.3.3)

- Download and unpack the `latest release
(0.3.2) <https://github.com/sigven/pcgr/releases/latest>`__
(0.3.3) <https://github.com/sigven/pcgr/releases/latest>`__

- Download and unpack the data bundle (approx. 17Gb) in the PCGR
directory

- Download `the latest data
bundle <https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/>`__
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
version number, e.g. ``~/pcgr-0.3.2``)
version number, e.g. ``~/pcgr-0.3.3``)
- Decompress and untar the bundle, e.g. through the following Unix
command:
``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -``
Expand All @@ -62,10 +62,10 @@ Download PCGR
have been produced

- Pull the `PCGR Docker image -
0.3.2 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
(3.1Gb) :
0.3.3 <https://hub.docker.com/r/sigven/pcgr/>`__ from DockerHub
(3.2Gb) :

- ``docker pull sigven/pcgr:0.3.2`` (PCGR annotation engine)
- ``docker pull sigven/pcgr:0.3.3`` (PCGR annotation engine)

Run test - generation of clinical report for a cancer genome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -89,7 +89,7 @@ A tumor sample report is generated by calling the Python script

positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
e.g. ~/pcgr-0.3.2
e.g. ~/pcgr-0.3.3
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
Expand Down Expand Up @@ -125,7 +125,7 @@ sequenced within TCGA. A report for a colorectal tumor case can be
generated by running the following command in your terminal window:

``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments``
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD``
``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.3 ~/pcgr-0.3.3/examples tumor_sample.COAD``

This command will run the Docker-based PCGR workflow and produce the
following output files in the *examples* folder:
Expand Down
22 changes: 5 additions & 17 deletions docs/_build/html/_sources/output.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,11 @@ currently supported.
VCF
^^^

The following requirements **MUST** be met by the input VCF for PCGR to
work properly:

1. Variants in the raw VCF that contain multiple alternative alleles
(e.g. "multiple ALTs") must be split into variants with a single
alternative allele. This can be done with the help of either `vt
decompose <http://genome.sph.umich.edu/wiki/Vt#Decompose>`__ or
`vcflib's vcfbreakmulti <https://github.com/vcflib/vcflib#vcflib>`__.
We will add integrated support for this in an upcoming release
2. The contents of the VCF must be sorted correctly (i.e. according to
chromosomal order and chromosomal position). This can be obtained by
`vcftools <https://vcftools.github.io/perl_module.html#vcf-sort>`__.

- We **strongly** recommend that the input VCF is compressed and
indexed using `bgzip <http://www.htslib.org/doc/tabix.html>`__ and
`tabix <http://www.htslib.org/doc/tabix.html>`__
- 'chr' must be stripped from the chromosome names
- We **strongly** recommend that the input VCF is compressed and
indexed using `bgzip <http://www.htslib.org/doc/tabix.html>`__ and
`tabix <http://www.htslib.org/doc/tabix.html>`__
- If the input VCF contains multi-allelic sites, these will be subject
to `decomposition <http://genome.sph.umich.edu/wiki/Vt#Decompose>`__

**IMPORTANT NOTE 1**: Considering the VCF output for the `numerous
somatic SNV/InDel callers <https://www.biostars.org/p/19104/>`__ that
Expand Down
35 changes: 28 additions & 7 deletions docs/_build/html/about.html
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,13 @@
<p class="caption"><span class="caption-text">Table of Contents</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">About</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#what-is-the-personal-cancer-genome-reporter-pcgr">What is the Personal Cancer Genome Reporter (PCGR)?</a></li>
<li class="toctree-l2"><a class="reference internal" href="#what-is-the-personal-cancer-genome-reporter-pcgr">What is the Personal Cancer Genome Reporter (PCGR)?</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#example-reports">Example reports</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#why-use-pcgr">Why use PCGR?</a></li>
<li class="toctree-l2"><a class="reference internal" href="#docker-based-technology">Docker-based technology</a></li>
<li class="toctree-l2"><a class="reference internal" href="#contact">Contact</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting started</a></li>
Expand Down Expand Up @@ -147,21 +151,29 @@ <h1>About<a class="headerlink" href="#about" title="Permalink to this headline">
<div class="section" id="what-is-the-personal-cancer-genome-reporter-pcgr">
<h2>What is the Personal Cancer Genome Reporter (PCGR)?<a class="headerlink" href="#what-is-the-personal-cancer-genome-reporter-pcgr" title="Permalink to this headline"></a></h2>
<p>The Personal Cancer Genome Reporter (PCGR) is a stand-alone software
package intended for analysis and clinical interpretation of individual
cancer genomes. It interprets both somatic SNVs/InDels and copy number
aberrations. The software extends basic gene and variant annotations
from the <a class="reference external" href="http://www.ensembl.org/info/docs/tools/vep/index.html">Ensembl’s Variant Effect Predictor
package for functional annotation and translation of individual cancer
genomes for precision oncology. It interprets both somatic SNVs/InDels
and copy number aberrations. The software extends basic gene and variant
annotations from the <a class="reference external" href="http://www.ensembl.org/info/docs/tools/vep/index.html">Ensembl’s Variant Effect Predictor
(VEP)</a> with
oncology-relevant, up-to-date annotations retrieved flexibly through
<a class="reference external" href="https://github.com/brentp/vcfanno">vcfanno</a>, and produces HTML
reports that can be navigated by clinical oncologists (Figure 1).</p>
<a class="reference external" href="https://github.com/brentp/vcfanno">vcfanno</a>, and produces
interactive HTML reports intended for clinical interpretation (Figure
1).</p>
<div class="figure">
<img alt="" src="_images/PCGR_workflow.png" />
</div>
<p>The Personal Cancer Genome Reporter has been developed by scientists
affiliated with the <a class="reference external" href="http://cancergenomics.no">Norwegian Cancer Genomics
Consortium</a>, at the <a class="reference external" href="http://radium.no">Institute for Cancer
Research/Oslo University Hospital</a>.</p>
<div class="section" id="example-reports">
<h3>Example reports<a class="headerlink" href="#example-reports" title="Permalink to this headline"></a></h3>
<ul class="simple">
<li>Report for a colorectal tumor sample (TCGA)</li>
<li>Report for a breast tumor sample (TCGA)</li>
</ul>
</div>
</div>
<div class="section" id="why-use-pcgr">
<h2>Why use PCGR?<a class="headerlink" href="#why-use-pcgr" title="Permalink to this headline"></a></h2>
Expand All @@ -176,6 +188,11 @@ <h2>Why use PCGR?<a class="headerlink" href="#why-use-pcgr" title="Permalink to
and variant level. The application generates a tiered report that will
aid the interpretation of individual cancer genomes in a clinical
setting.</p>
<p>If you use PCGR, please cite our paper:</p>
<p>Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, and
Eivind Hovig. <strong>Personal Cancer Genome Reporter: Variant Interpretation
Report For Precision Oncology</strong> (2017). bioRxiv.
doi:<a class="reference external" href="https://doi.org/10.1101/122366">10.1101/122366</a></p>
</div>
<div class="section" id="docker-based-technology">
<h2>Docker-based technology<a class="headerlink" href="#docker-based-technology" title="Permalink to this headline"></a></h2>
Expand All @@ -190,6 +207,10 @@ <h2>Docker-based technology<a class="headerlink" href="#docker-based-technology"
<img alt="" src="_images/docker-logo50.png" />
</div>
</div>
<div class="section" id="contact">
<h2>Contact<a class="headerlink" href="#contact" title="Permalink to this headline"></a></h2>
<p><a class="reference external" href="mailto:sigven&#37;&#52;&#48;ifi&#46;uio&#46;no">sigven<span>&#64;</span>ifi<span>&#46;</span>uio<span>&#46;</span>no</a></p>
</div>
</div>


Expand Down
Loading

0 comments on commit f35d825

Please sign in to comment.