# Formulas for Computing Linkage Disequilibrium (LD)¶

Two approaches are available for computing linkage disequilibrium (LD), depending upon the method used for imputing the two-marker haplotype frequencies upon which the LD computations depend, expectation-maximization (EM) vs. the composite haplotype method (CHM).

## Computing LD using Expectation-Maximization (EM)¶

First, the EM method (see *Expectation Maximization (EM)* and *Haplotype Frequency Estimation Methods*) is used to
impute the two-marker haplotype probabilities for the
-th and -th allele in the first and second
markers, respectively.

Using these, the signed statistics may be calculated as

where is the frequency of allele in the first marker, and is the frequency of allele in the second marker.

If there are alleles in the first marker and alleles in the second, a chi-squared distribution with degrees of freedom may be written as

may then be computed by taking the p-value as

and obtaining from the inverse distribution for one degree of freedom as

For the two-locus two-allele case, this procedure simplifies to the following direct formula

which, for this two-locus two-allele case, may be shown to be equivalent to

where may be chosen as either one of the alleles in the first marker and may be chosen as either one of the alleles in the second marker.

## Computing LD using the Composite Haplotype Method (CHM)¶

### Multi-Allelic¶

If there are alleles in the first marker and alleles in the second, where either or or both, and using the same notation for and as above, a chi-squared distribution with degrees of freedom may be written as

where

and is defined as in *Composite Haplotype Method (CHM)*.
Here, we are effectively using

which includes both inter- and intra-gametic component frequencies, as our haplotype frequencies.

may then be computed by taking the p-value as

and obtaining from the inverse distribution for one degree of freedom as

### Bi-Allelic¶

For the two-locus two-allele case, and using the notation of
*Composite Haplotype Method (CHM)*, we compute
using the following direct formula

where and are the Hardy-Weinberg coefficients for allele of the first marker and allele of the second marker, respectively. This formula may be thought of as putting a “Hardy-Weinberg correction” onto the formula

which is only completely accurate under the special circumstance of random mating (perfect Hardy-Weinberg equilibrium over the two-marker haplotypes), for which approximates and is an unbiased estimate of .

It may be shown that for the circumstance of perfect linkage disequilibrium, the result of using the “Hardy-Weinberg correction” formula is equivalent to

## The D-Prime Statistic¶

If the minor allele frequencies of the respective markers are small, the magnitude of the statistic cannot get very large, even if the marker is in almost complete linkage disequilibrium, compared to the magnitude it could have had if the allele frequencies of the markers were almost equal.

The D-prime statistic was designed to compensate for this. is defined as normalized by the maximum possible value that could possibly have given the allele frequencies in each of the markers.

Specifically,

if , and

otherwise.

The overall D-prime statistic is defined as

### Computing D-Prime¶

For EM, the above formula is used directly on the values of
, , and , where the
are imputed using the technique of
*Computing LD using Expectation-Maximization (EM)*.

For multi-allelic CHM, we use

if , and

otherwise, with the overall D-prime statistic being defined as

For bi-allelic CHM, we use the same formulas as for multi-allelic CHM, except that for the final , we take the original overall obtained as above and use a Hardy-Weinberg correction on it:

where , , and
are defined as in *Bi-Allelic*.