LoH Caller

This algorithm calls Loss of Heterozygosity events based on variant allele frequency. This algorithm is modeled on the H3M2 algorithm.

H3M2 uses a heterogeneous Hidden Markov Model (HMM) that incorporates inter-marker distances to identify LoH events from allele frequency data. The model has three hidden states representing homozygous, non-homozygous, and trisomy states of the genome, while the observations are the variant allele frequency values at each position. The emission probability distributions are defined by two truncated Gaussian mixture models as follows:

P(AF_i| hom_i) = c_1N(0, F \cdot \sigma) + c_3N(1, F \cdot \sigma_3 )

P(AF_i| het_i) = c_1N(0, F \cdot \sigma) + c_2N(\frac{1}{2}, \sigma_2) + c_3N(1, F \cdot \sigma_3 )

P(AF_i| tri_i) = c_1N(0, F \cdot \sigma) + c_2N(\frac{1}{3}, \sigma_2) + c_3N(\frac{2}{3}, \sigma_3) + c_4N(1, F \cdot \sigma_4 )

where c_i is the weight of the i-th component of the mixture model, F is a parameter used to modulate the spread of the distributions, and AF_i denotes the allele frequency. The values hom_i, het_i, and tri_i denote homozygous, heterozygous, and trisomy states respectively.

The transition probability for moving from a non-homozygous state to either homozygous or trisomy is given by:

P(het_i | hom_{i-1} \vee tri_{i-1}) = p_1 (1 - e^{- \frac{d_i}{d_{norm}}})

where p_1 denotes the likelihood of moving from the non-homozygous state, d_i is the genomic distance between position i and i-1, and d_{norm} is used to modulate the effect of the genomic distance on the transition probabilities. The probability for moving from a trisomy or homozygous state is defined similarly, using a parameter p_2 to specify the likelihood of the state change.

Requirements

Requires Variant Allele Frequency (VAF) and Genotype Quality (GQ) sample level fields.

Output

The LoH Caller algorithm will generate a LoHs table view. This table will include records for all called events.

  • Region: Genomic coordinates (Chr: Start-Stop)
  • # Samples: Number of samples in the event
  • Span: The width of the event. Computed from the difference between the stop and start positions.
  • LoH Event: True if the LoH is present in the current sample.
  • LoH State: The state of the event; either LoH or Trisomy.
  • Variants Considered: Number of variants in the LoH
  • Percent in Expected State: Percentage of variants with VAF consistent with the called LoH state.