CADD

If licensed, VarSeq includes annotating variants against CADD (Combined Annotation Dependent Depletion).

CADD can be used to score the effect or the deleteriousness of mutations including single nucleotide variants (SNVs) and insertions or deletions (InDels). Currently CADD is only available for scoring using the human reference sequence GRCh_37 with the 1000 Genomes definition of the MT mitochondrial region. See CADD for specifics on the provided data.

Note

To add the CADD annotation source to your license for VarSeq contact your account manager or support@goldenhelix.com.

To access this annotation, go to Add > Annotation... then select the source from the Secure Annotations/CADD location of the Data Source Library.

CADD scores are precomputed for all SNVs and approximately 20 million known insertions and deletions. Estimates are computed for novel InDels.

Estimating CADD Scores

The method used to estimate CADD scores for InDels not present in the database treats frame-shift vs in-frame (length divisible by 3) insertions or deletions separately.

The estimated CADD score is taken to be:

  • Frameshift Insertions: The maximum of all SNV scores for both flanking bases
  • Frameshift Deletions: The maximum of all SNV scores for all spanning bases
  • In-frameshift Insertions: The average of all SNV scores for both flanking bases
  • In-frameshift Deletions: The average of all SNV scores for all spanning bases