3.11.1. CNV Caller on Target Regions¶
This algorithm uses sample level coverage statistics to detect copy number variations (CNV). Each coverage region is classified as either homozygous deletion, heterozygous deletion, diploid, or duplication.
Reference samples are used to normalize the coverage data and statistics are reported to provide an overview of the evidence for each classification. This algorithm has been tested on gene panel data, as well as whole exome data, and is capable of calling events ranging from single exon deletions, to whole chromosome duplications. The minimum and maximum reference sample count can be configured, and, if you have a large number of control samples in your reference folder, we suggest increasing the maximum value. See VarSeq CNV Caller for more details
With the addition of the “CNV Caller” add-on to your VarSeq license, you can add this algorithm to your interactive or automated VSPipeline executed workflows.
To add the CNV caller to your license of VarSeq contact email@example.com.
Coverage statistics must be computed prior to running this algorithm. For best results, we recommend at least 100x coverage and 30 reference samples. The current references can be viewed and edited with the Manage Reference Samples dialog, for more information see VarSeq CNV Reference Manager.
The first tab in the dialog allows the user to specify the following options:
Sensitivity/Precision: Adjusts the trade-off between the true positive rate and the true negative rate.
Minimum Number of Reference Samples: Desired minimum number of reference samples to be selected.
Maximum Number of Reference Samples: The maximum number of reference samples to be selected.
Exclude reference samples with percent difference greater than: This option will filter reference samples with a percent difference above the specified value after a minimum of 10 samples have been selected.
Add samples to reference set: This option adds the current project’s sample to the reference set. Go to Tools > Open Folder > Reference Samples Folder to see all the samples that have been added to your reference set over time.
Normalize sex chromosomes using only controls with matching sex: If this option is selected, and there are non-autosomal targets (X and Y), a gender will be inferred for each sample. Based on that gender, a set of gender-matched references will be selected for normalizing these chromosomes. Un-check this if you don’t have enough samples to do gender-matched normalization or expect all samples to be predominantly one sex.
Reference Sample Folder: The folder containing the reference samples used to normalize the coverage data.
Controls average target mean depth below: Flags targets with average reference sample depth below the specified value.
Controls variation coefficient above: Flags targets for which the variation coefficient is above the specified value. A high variation coefficient indicates that there is extreme variation in reference sample coverage for the target region.
Targets Excluded From Normalization and CNV Calling: These targets are completely excluded from the CNV Calling process and will not be used during normalization or event calling.
The CNV calling algorithm relies on probability distributions associated with both the Z-score and Ratio metrics. The Z-score for a target measures the number of standard deviations a sample’s coverage is from the mean reference sample coverage, while the Ratio is the target coverage divided by the mean reference sample coverage.
Each metric is associated with three probability distributions; one for each type of CNV: Hom. Deletion, Het. Deletion, and Duplication.
Normal distributions are used for the deletion distributions, while log-normal distributions are used for the duplication distributions.
These parameters can be specified in the advanced tab, which contains the following options:
Prior CNV Probability: Specify the prior and transition probabilities for CNV state.
Z-score Parameters: Specify the parameters for the Z-score distributions.
Ratio Parameters: Specify the parameters for the Ratio distributions.
Utilize Variant Allele Frequency: Use Variant Allele Frequency when calling CNVs.
Use Optimal Segmentation Algorithm: Selects the slower optimal segmentation algorithm. If this option is not selected, regions will be segmented using Circular Binary Segmentation.
Signal Scaling for High Percent Difference: This option will compress the signal for regions outside of LoF and Trisomy events. The amount of signal compression is proportional to the sample’s percent difference.
Subset normalization targets in exomes to those containing het variants: This option will cause the algorithm to only use targets containing variants with a heterozygous VAF when computing the mean coverage for normalization. This is recommended only for highly mutated tumor samples.
Perform GC Correction: Performs normalization based on GC-content. Each targets is assigned a bin based on its GC-content and is normalized using the median coverage of it’s assigned GC-content bin. This normalization process can only be used on exome panels.
Size of the Reference Sample Subset: Reference samples are selected from a subset constructed based on how close each reference sample’s total mean depth is to the sample of interest. By default, the size of this subset if 2 times the maximum number of reference samples.
Percent of Targets to Force Aneuploid Call: If the percentage of deleted or duplicated targets in a chromosome exceed this threshold, then the entire chromosome will be called as an aneuploid deletion or duplication event.
Sample Type: The sample type can be set to either Auto, Gene Panel, or Exome. If set to Auto, the algorithm will infer the sample type as Exome if and only if the number of targets exceeds 40,000.
Regions Ignored During Normalization: The blacklist region file is used to specify regions to be excluded from the normalization process. These regions will still be called by the CNV caller, but will not be used for normalization. Generally, this option should not be used as the Targets Excluded From Normalization and CNV Calling option is the preferred method for excluding targets from CNV calling.
Output of the CNVs Table¶
The CNV Caller algorithm will generate a CNVs table view. This table will include records for all called CNV events.
Region: Genomic coordinates (Chr: Start-Stop)
Type: Type of the CNV (Gain or Loss)
# Targets: Number of targets in the event.
# Samples: Number of samples in the event.
Span: The width of the event. Computed from the difference between the stop and start positions.
CNV State: State of the CNV event. Either Deletion, Het Deletion, Duplicate or CN LoH.
Flags: QC warnings for the event.
Low Controls Depth: Mean read depth over controls exceeded threshold.
High Controls Variation: Variation coefficient exceeded threshold.
Within Regional IQR: Event is not significantly different from surrounding normal regions based on regional IQR.
Low Z Score: Event has a low average z-score.
Insufficient Ratio: Event has an average ratio that is inconsistent with the CNV state.
Deletion Contains Heterozygous Variants: Every exon of the deletion contains multiple heterozygous variants.
Extreme GC Content: GC Content is below 0.30 or above 0.70.
Avg Target Mean Depth: Average mean depth of the targets in this event as reported by Coverage Statistics.
Avg Z Score: Average Z-score of the event.
Avg Ratio: Average ratio of the event.
Variants Considered: Number of targets in the event.
Supporting LoH Variants: Total number of variants within an LoH event supporting the called CNV state.
Karyotype: Cytogenetic nomenclature for this event.
GC Content: GC content of the event regions.
p-value: Probability that z-scores at least as extreme as those in the event would occur by chance in a diploid region.
Output in the Samples Table¶
Summary fields are appended to the Samples Table. These fields provide summary information computed across all of the CNVs.
Sample Flags: QC warnings for the samples
High IQR: High interquartile range for Z-score and ratio. This flag indicates that there is high variance between targets for one or more of the evidence metrics.
High Median Z-score: The median of all the z-scores was above 0.4. This indicates a general skew of this samples away from the reference samples, likely to cause excessive duplication calls.
Low Sample Mean Depth: Sample mean depth below 30.
Mismatch to reference samples: Match score indicates low similarity to control samples.
Mismatch to non-autosomal reference samples: Match score indicates low similarity to non-autosomal control samples.
No coverage information: Not enough coverage information to call CNVs for the sample.
Few Gender Matches: Not enough reference samples with matching gender to call X and Y CNVs.
Fewer than than two matched references: Fewer than than two matched reference samples.
Inferred Gender: Gender inferred from X chromosome coverage ratio and Y coverage when more than 50 targets are present in the Y chromosome
# CNV Events: Number of CNV events
# Flagged CNV Events: Number of flagged CNV events
# Unflagged CNV Events: Number of unflagged CNV events
# Hom Deletions: Number of homozygous deletion events
# Het Deletions: Number of heterozygous deletion events
# Duplications: Number of duplication events
Z-score IQR: Interquartile range of the Z-scores over all targets
Ratio IQR: Interquartile range of the ratios over all targets
Variants Considered: Variants considered for VAF content
Percent Difference: Average percent difference between sample and matched controls for autosomal regions
Reference Samples: Samples selected as matched controls
X Ratio: Ratio of X chromosome coverage to autosomal chromosome coverage
Y Ratio: Ratio of Y chromosome coverage to autosomal chromosome coverage
Non-Autosomal Percent Difference: Average percent difference between sample and matched controls for non-autosomal regions
Non-Autosomal Reference Samples: Samples selected as matched controls for non-autosomal regions
Karyotype: Chromosomal CNV information for this sample.
Output in the Coverage Regions Table¶
Target level CNV fields are appended to the Coverage Regions Table. These fields provide information computed across all coverage regions.
CNV State: State of the CNV call or this target. Either homozygous deletion, heterozygous deletion, diploid, or duplication
Flags: QC flags for the target region.
Low Controls Depth: Mean read depth over controls exceeded threshold
High Controls Variation: Variation coefficient exceeded threshold
Within Regional IQR: Event is not significantly different from surrounding normal regions based on regional IQR.
Few Gender Matches: Not enough reference samples with matching gender to call X and Y CNVs
In Blacklist: Target was contained by a blacklist region
Z Score: Z-score of the target. Computed as (normalized target depth - mean depth across controls) / standard deviation
Ratio: Ratio of normalized target depth over mean depth across controls
Variants Considered: Variant considered for VAF content.
Normalized Mean Depth: (hidden by default) Target depth / Mean depth over controls
Avg. Normalized Control Depth: (hidden by default) Average normalized depth for the target in all the controls
Control Standard Dev.: (hidden by default) Standard deviation of normalized depth in all the controls
Avg. Control Depth: (hidden by default) Average raw mean depth for target over controls
GC Content: GC content of the target region.
Variants by CNVs Table¶
This composite table view includes all of the CNVs that cover one or more variants from the filtered Variant table. The CNVs appear in the left hand table, and the corresponding variants in the right hand table. The variants that fall within each CNV can be viewed by changing the row selection in the CNV table.
Output in the Variant Table¶
Variants will be matched to any CNVs they fall within. The values for each of the matching CNVs will be listed in their respective fields which are appended to the Variant table.