3.8.1. VarSeq CNV Caller

Clinical genetic testing often requires looking for and interpreting Copy Number Variants (CNV) as well as small point mutations. While NGS based gene panels and exome tests have become the best practice assay for many types of genetic tests, CNVs must be detected and analyzed using a different paradigm.

VarSeq software includes a CNV calling algorithm that operates on existing clinical NGS gene panel, exome, and whole genome NGS data. Along with the calling of CNV events, the entire workflow is managed inside VarSeq’s clinical interpretation workflow. This integration enables CNV events to be considered alongside the annotated and filtered NGS small variants and incorporated into clinical reporting.

Algorithm Details

The CNV Calling algorithm was developed based on a combination of methods in the existing CNV literature and novel methods developed by our engineers. The algorithm relies on coverage information computed from BAM files and uses changes in coverage relative to a collection of reference samples as evidence of CNV events. Using these reference samples, the algorithm computes two evidence metrics: Z-score and Ratio. The Z-score measures the number of standard deviations from the reference sample mean, while the Ratio is the normalized mean for the sample of interest divided by the average normalized mean for the reference samples. The utility of these metrics can be seen by looking at the duplication event shown below.

Z-score and Ratio Example

In the figure above, the spike in both Z-score and Ratio over four exons of this gene provide supporting evidence for the called Duplication event.

A third metric used by the CNV caller is Variant Allele Frequency (VAF). While VAF is not a primary metric used for identification of CNVs, it can provide supporting evidence for or against certain types of events. For example, values other than 0 or 1 are evidence against heterozygous deletion events, while values of 1/3 and 2/3 provides supporting evidence for duplications. The advantage provided by VAF can be seen in the figure below.

VAF Example

In the above figure, two exons were called as deletions prior to utilizing VAF. However, the presence of two variants with VAF of 0.5 within the region provides the algorithm with evidence against a deletion, allowing us to successfully classify the exons as diploid.

Since these metrics can be noisy over very large regions, a segmentation algorithm is used to call large multi-gene and whole chromosome events. If a region contains many small CNV events, a segmentation algorithm is used to segment the region and small events the share a segmented region are merged.

Once a set of CNV events have been called, quality control flagging is performed to identify unreliable samples and potentially problematic CNV calls. These QC flags are applied to both CNV events as well as samples.

The following are examples of CNV event flags:

  • Low Controls Depth: Mean read depth over controls exceeded threshold.

  • High Controls Variation: Variation coefficient exceeded threshold.

  • Within Regional IQR: Event is not significantly different from surrounding normal regions based on regional IQR.

  • Low Z Score: Event has a low average z-score.

  • Insufficient Ratio: Event has an average ratio that is inconsistent with the CNV state.

  • Deletion Contains Heterozygous Variants: Every exon of the deletion contains multiple heterozygous variants.

  • Extreme GC Content: GC Content is below 0.30 or above 0.70.

The following are examples of Sample flags:

  • High IQR: High interquartile range for Z-score and ratio. This flag indicates that there is high variance between targets for one or more of the evidence metrics.

  • High Median Z-score: The median of all the z-scores was above 0.4. This indicates a general skew of this samples away from the reference samples, likely to cause excessive duplication calls.

  • Low Sample Mean Depth: Sample mean depth below 30.

  • Mismatch to reference samples: Match score indicates low similarity to control samples.

  • Mismatch to non-autosomal reference samples: Match score indicates low similarity to non-autosomal control samples.

  • No coverage information: Not enough coverage information to call CNVs for the sample.

  • Few Gender Matches: Not enough reference samples with matching gender to call X and Y CNVs.

  • Fewer than than two matched references: Fewer than than two matched reference samples.

By flagging these events and samples, we provide a second layer of heuristics, which can be used to reduce false positives and identify questionable CNV calls.