3. CancerKB and Annotations DatabasesΒΆ

The AMP Guidelines paper provides several tables of public and proprietary databases that contain information about variant frequencies, known somatic mutations, functional predictions, and treatment/clinical trial information. These categorized lists are meant as a non-exhaustive survey of resources that clinical labs may reference when following the AMP guidelines. VarSeq and VSClinical includes all of these sources that are publicly available and many of the proprietary sources (some are not available through commercial licensing).

The first table from the AMP paper enumerates population databases that can be used to determine common variants versus rare variants.

population databases

Figure 3-1: The population databased used by AMP Guidelines analyses.

Many cancer variants recur in specific types of cancer, and somatic catalogs allow annotating how many samples and in which tumor types a variant has previously been observed. Clinical databases such as CIViC and PMKB catalog evidence statements and clinical interpretations about mutations and specific tumor types. Finally, other cancer-specific sources such as mutation hot-spots and clinical trials for relevant cancer drugs are referenced.

cancer specific

Figure 3-2: Cancer specific databases used.

Sequence repositories are used for sequence alignments and the definition of genes and transcripts on the reference sequence.

sequence repository

Figure 3-3: The list of sequence repository annotation sources.

We also support Clinical, Drug and Prediction annotations such as DrugBank, ClinVar, the Clinical Genomics Database and the Genetics Home Reference. DrugBank in particular provides critical information about FDA approved drugs with indications for a given gene or biomarker.

clinical drug prediction

Figure 3-4: Clinical, drug, and prediction sources.

The next group of annotations is used to highlight splice site regions and incorporate functional prediction scores for possible sequence disruption.

splice site functional prediction

Figure 3-5: Splice site and functional prediction sources.

An additional feature of the VSClinical AMP Guidelines is the GoldenHelix CancerKB catalog which is accessible for any GoldenHelix user with purchase of the AMP Guidelines. This catalog is manually curated dataset containing assessments of biomarkers and genes in the context of specific cancers, including information on the Gene, Biomarker and available treatments. This catalog is built by an expert panel of curators and professionals in the clinical context that aggregate and write up interpretations the most commonly seen biomarkers and genes. For example, CancerKB will contain interpretations for the clinically relevant BRAF V600E mutation in melanoma. Interpretations provided by CancerKB are a great starting point, and as you save your own lab interpretations, your internal knowledgebase will grow to cover more and more of the biomarkers seen in each new sample. Additionally, users of the AMP feature can choose to share their interpretations back to the GHI curators anonymously. CancerKB will be updated on a regular basis to serve as an ever-growing cancer resource.

ACMG sample classifier

Figure 3-6: The ACMG Sample Classifier algorithm selection.

The CancerKB catalog can be used as a starting point for a lab to finalize an interpretation and streamline the progress to final report, which we will see in the following examples.