3.10.8. Annotate Transcripts

This algorithm annotates variants against overlapping transcripts. The algorithm produces a number of fields for each variant. These fields are documented in the field documentation produced by the computation.

To handle the presence of multiple transcripts, this algorithm produces three column groups:

  • Summary Fields that display per-transcript annotations in the context of a single clinically relevant or combined transcript annotation.

  • Transcript Interactions that include the per-transcript annotations for all transcripts.

  • Aux Fields that pass through the transcript-level additional auxiliary fields from the annotation source.

To aggregate the per-transcript annotations into a single value that can be used for filtering or exporting the variant table in a meaningful way, there are two strategies and corresponding summary fields:

  • Combined: For fields like Sequence Ontology and Gene Region, this takes the most damaging annotation result from all transcripts and is useful for conservative filtering.

  • Clinically Relevant: These fields show annotation results for the gene’s clinically relevant transcript.

For details on the heuristics used to select clinically relevant transcripts see Gene and Transcript Preferences.

Requirements

An gene source is needed to run the algorithm.

Options

  • Only annotate verified mRNA transcripts: If checked, only verified transcripts will be included.

  • Include splice site predictions: If checked, includes 4 algorithms and a summary voting of these algorithms on whether the variant disruptis a nearby canonical splice site.

  • Amino Acid Notation: Amino acids can be represented as either three letter or one letter abbreviations.

  • Splice Site Boundaries: The distances used to classify splice site boundaries can be adjusted as needed.

    • Splice Donor Distance: Default is 2 bp

    • Splice Acceptor Distance: Default is 2 bp

    • Splice Region Exonic Distance: Default is 3 bp

    • Splice Region Intronic Distance: Default is 8 bp

  • Preferred Transcript(s): A list of transcripts that should be preferred as the clinical relevant transcript.

Output

Three column groups will be created. A summary group, which collapses overlapping annotations to produce a clinically relevant annotation. Second, a full annotation which details the interactions for each transcript-variant pair. Finally, a set of columns containing the information in the underlying source.

If multiple transcripts overlap the variant then the results will be joined together in a list for each field. If a variant is intergenic non applicable fields will be filled in with missing values.

Splice Site Detection Boundaries

Callable splice site positions for each algorithm:

  • MaxEntScan

    • Donor: -3, +6

    • Acceptor: -18, +3

  • GeneSplicer

    • Donor/Acceptor: -80, +80

  • NNSplice

    • Donor: -7, +8

    • Acceptor: -21, +20

  • PWM

    • Donor: -3, +4

    • Acceptor: -12, +1