1. CNV Calling Algorithm Overview

VarSeq ® software supports calling CNVs from coverage data computed from imported BAM files. This tutorial focuses on calling and interpretation of CNVs using VarSeq.

In this tutorial, we will begin by opening an existing project containing computed coverage data for a number of cancer gene panel samples. Using this coverage data, we will call CNVs, plot the CNV data, and interpret the results.

The project files are contained in the ZIP folder that accompanies this tutorial. This project contains variant and coverage data for 48 samples over 31 tumor suppressor and oncogenes mutated frequently in myeloid malignancies. After the ZIP folder has been downloaded, extract the contents to a convenient location.

The VarSeq CNV calling algorithm relies on coverage information computed from BAM files. The algorithm uses changes in coverage relative to a collection of reference samples as evidence of CNV events. Using these reference samples, the algorithm computes two evidence metrics: Z-score and Ratio. The Z-score measures the number of standard deviations from the reference sample mean, while the Ratio is the normalized mean for the sample of interest divided by the average normalized mean for the reference samples. The utility of these metrics can be seen by looking at the duplication event shown below.

Example Duplication

Figure 1-1. Ratio and Z-score for a section of BRCA2

In the figure above, the spike in both Z-score and Ratio over four exons of this gene provide supporting evidence for the called Duplication event.

A third metric used by the CNV caller is Variant Allele Frequency (VAF). While VAF is not a primary metric used for identification of CNVs, it can provide supporting evidence for, or against certain types of events. For example, values other than 0 or 1 are evidence against heterozygous deletion events, while values of 1/3 and 2/3 provides supporting evidence for duplications. The advantage provided by VAF can be seen in the figure below.

VAF as Evidence Against Deletion

Figure 1-2. VAF as Evidence Against Deletions

In the above figure, two exons were called as deletions prior to utilizing VAF. However, the presence of two variants with VAF of 0.5 within the region provides the algorithm with evidence against a deletion, allowing us to successfully classify the exons as diploid.

Using these three metrics, the algorithm assigns a CNV state to each target region and then merges these regions to obtain contiguous CNV events.

Once a set of CNV events have been called, quality control flagging is performed to identify unreliable samples and potentially problematic CNV calls. These QC flags are applied to both CNV events as well as samples.

The following are examples of CNV event flags:

  • Low reference sample read depth in the surrounding region;
  • High variation in the region between reference samples; and
  • If Ratio or Z-score fall within the noise of the surrounding region.

The following are examples of Sample flags:

  • Their metrics have extremely high variation;
  • Samples have very low mean depth; and
  • Samples differ significantly from the selected reference samples.

By flagging these events and samples, we provide a second layer of heuristics, which can be used to reduce false positives and identify questionable CNV calls.