# LD Reports¶

## LD Reports Overview¶

LD Analysis can be performed in a variety of ways. A user can choose to calculate and output LD between all marker pairs, only pairs within gene regions or only adjacent pairs. These options are described in more detail below.

All of these functions can be found in a spreadsheet’s **Genotype** menu,
under **LD Reports**.

## LD Adjacent Pairs Analysis¶

LD Analysis is performed on all adjacent pairs within a chromosome (if a marker map is applied) or within a haplotype block. The results contain values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate signed D statistic from the CHM method. The spreadsheet may include both genotype and phenotype data and may or may not have a marker map applied.

Optionally, the user may choose to use a haplotype block spreadsheet. In this case, only markers found in the row labels of the haplotype block spreadsheet will be considered in the LD calculations.

Another way to limit the number of calculations is to subset the
desired markers within an LD plot. To do this, first open an LD plot
and create a block of markers. Under the Haplotype Block Set
attributes, select **Subset Markers**. Run the function from the
resulting spreadsheet.

The output will have the row labels and first column listing the markers that were compared (all adjacent pairs comparisons, only listing one occurrence of the pair), the distance in markers, the distance in Kb (if the genotype spreadsheet was marker mapped), both R2 and D’ values for the EM method and the CHM method.

## LD Pairwise Analysis Matrix¶

LD Analysis is performed on all pairs within a chromosome (if a marker map is applied) or within a haplotype block. This function creates five spreadsheets in matrix form (markers as the row labels and column headers). The spreadsheets contain values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate CHM signed D statistic.

Optionally, the user may choose to use a haplotype block spreadsheet. In this case, only markers found in the row labels of the haplotype block spreadsheet will be considered in the LD calculations.

Another way to limit the number of calculations is to subset the
desired markers within an LD plot. To do this, first open an LD plot
and create a block of markers. Under the Haplotype Block Set
attributes, select **Subset Markers**. Run the function from the
resulting spreadsheet.

This function is not optimized for whole genome datasets. We recommend limiting you dataset in one of the ways described above or simply activating only markers of interest.

## LD Pairwise Analysis within Gene Regions¶

LD Analysis is performed on all pairs within a gene region, specified by a user-selected annotation track. The resulting spreadsheet contains values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate CHM signed D statistic.

The result is a new spreadsheet with the first two columns listing the markers that were compared (all pair-wise (within gene) comparisons , only listing one occurrence of the pair), the gene, the distance in markers, the distance in Kb (if the genotype spreadsheet was marker mapped), both R2 and D’ values for the EM method and the CHM method and the signed D value from the CHM method.

## LD Pairwise Analysis¶

LD Analysis is performed on all pairs within a chromosome (if a marker map is applied) or within a haplotype block. The resulting spreadsheet contain values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate signed D statistic from the CHM method.

Optionally, the user may choose to use a haplotype block spreadsheet. In this case, only markers found in the row labels of the haplotype block spreadsheet will be considered in the LD calculations.

Another way to limit the number of calculations is to subset the
desired markers within an LD plot. To do this, first open an LD plot
and create a block of markers. Under the Haplotype Block Set
attributes, select **Subset Markers**. Run the function from the
resulting spreadsheet.

This function is not optimized for whole genome datasets. We recommend limiting you dataset in one of the ways described above or simply activating only markers of interest.

## Nonlinear Regression of LD R Squared on Distances¶

This feature fits a nonlinear curve to a scatter plot of
against distance. The spreadsheet should have at least one column with distances
between pairs of markers (in kb) and one column with values.
This can be performed on the output spreadsheet obtained by running **LD Adjacent
Pairs Analysis**.

The curve is defined as [Remington2001]:

where C is where is the effective population size and is the recombination fraction between sites and is the sample size (number of sampled chromosomes). This model assumes a low level of mutation and accounts for sample size.

The nonlinear model based on the expectations has one coefficient, the least squares estimate of per bp distance between sites.

The curve is fitted using the scipy function scipy.optimize.curve_fit.