4. Performing Sample QCΒΆ

Once that CNV caller finishes computing results, a new table will be created labeled CNVs. This table contains the information related to each CNV called by the algorithm, but, before examining these results, we should perform sample-level quality control. This can be done by exploring the sample table, which is now populated with several useful metrics related to the CNV algorithm.

To open the sample table in VarSeq:

  • Select Samples from the drop down directly above the left side of the current table.
Samples table selection

Figure 4-1. Selecting Samples Table

You will notice that the CNV caller has populated with sample table with a number of fields under the heading “Copy Number Variants”.

Sample Table

Figure 4-2. Samples table view

The most useful field for sample QC is the “Sample Flags” field. This field will list one or more of the following flags if the sample fails any of our quality tests:

  • High IQR: High interquartile range for Z-score and ratio. This flag indicates that there is high variance between targets for one or more of the evidence metrics.
  • Low Sample Mean Depth: Sample mean depth below 30.
  • Mismatch to reference samples: Match score indicates low similarity to control samples.
  • Mismatch to non-autosomal reference samples: Match score indicates low similarity to non-autosomal control samples.
  • Few Gender Matches: Not enough reference samples with matching gender to call X and Y CNVs.

If any of the first first three flags are listed for a given sample, then all CNV calls associated with the sample will most likely be unreliable, while if last two flags are present, then CNV calls in non-autosomal will be unreliable.

The “Mismatch to reference samples” flag can often be resolved by rerunning the algorithm once more samples have been added to the reference set.

In the current project, two samples have been flagged. Samples 34 and 41 have been flagged with “Low Sample Mean Depth”.

In addition to QC flags, the sample table also provides summary information about the number of CNVs called, the inferred gender of the sample, the reference samples chosen, and the percent difference between each sample and it’s the references set.