LD Reports

LD Reports Overview

LD Analysis can be performed in a variety of ways. A user can choose to calculate and output LD between all marker pairs, only pairs within gene regions or only adjacent pairs. These options are described in more detail below.

All of these functions can be found in a spreadsheet’s Genotype menu, under LD Reports.

LD Adjacent Pairs Analysis

LD Analysis is performed on all adjacent pairs within a chromosome (if a marker map is applied) or within a haplotype block. The results contain values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate signed D statistic from the CHM method. The spreadsheet may include both genotype and phenotype data and may or may not have a marker map applied.

Optionally, the user may choose to use a haplotype block spreadsheet. In this case, only markers found in the row labels of the haplotype block spreadsheet will be considered in the LD calculations.

Another way to limit the number of calculations is to subset the desired markers within an LD plot. To do this, first open an LD plot and create a block of markers. Under the Haplotype Block Set attributes, select Subset Markers. Run the function from the resulting spreadsheet.

The output will have the row labels and first column listing the markers that were compared (all adjacent pairs comparisons, only listing one occurrence of the pair), the distance in markers, the distance in Kb (if the genotype spreadsheet was marker mapped), both R2 and D’ values for the EM method and the CHM method.

LD Pairwise Analysis Matrix

LD Analysis is performed on all pairs within a chromosome (if a marker map is applied) or within a haplotype block. This function creates five spreadsheets in matrix form (markers as the row labels and column headers). The spreadsheets contain values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate CHM signed D statistic.

Optionally, the user may choose to use a haplotype block spreadsheet. In this case, only markers found in the row labels of the haplotype block spreadsheet will be considered in the LD calculations.

Another way to limit the number of calculations is to subset the desired markers within an LD plot. To do this, first open an LD plot and create a block of markers. Under the Haplotype Block Set attributes, select Subset Markers. Run the function from the resulting spreadsheet.

This function is not optimized for whole genome datasets. We recommend limiting you dataset in one of the ways described above or simply activating only markers of interest.

LD Pairwise Analysis within Gene Regions

LD Analysis is performed on all pairs within a gene region, specified by a user-selected annotation track. The resulting spreadsheet contains values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate CHM signed D statistic.

The result is a new spreadsheet with the first two columns listing the markers that were compared (all pair-wise (within gene) comparisons , only listing one occurrence of the pair), the gene, the distance in markers, the distance in Kb (if the genotype spreadsheet was marker mapped), both R2 and D’ values for the EM method and the CHM method and the signed D value from the CHM method.

LD Pairwise Analysis

LD Analysis is performed on all pairs within a chromosome (if a marker map is applied) or within a haplotype block. The resulting spreadsheet contain values for both the EM and CHM methods and both R2 and D’ values as well as the intermediate signed D statistic from the CHM method.

Optionally, the user may choose to use a haplotype block spreadsheet. In this case, only markers found in the row labels of the haplotype block spreadsheet will be considered in the LD calculations.

Another way to limit the number of calculations is to subset the desired markers within an LD plot. To do this, first open an LD plot and create a block of markers. Under the Haplotype Block Set attributes, select Subset Markers. Run the function from the resulting spreadsheet.

This function is not optimized for whole genome datasets. We recommend limiting you dataset in one of the ways described above or simply activating only markers of interest.

Nonlinear Regression of LD R Squared on Distances

This feature fits a nonlinear curve to a scatter plot of r^2 against distance. The spreadsheet should have at least one column with distances between pairs of markers (in kb) and one column with r^2 values. This can be performed on the output spreadsheet obtained by running LD Adjacent Pairs Analysis.

The curve is defined as [Remington2001]:

E(r^2) = [ \frac {10 + C} {(2 + C)(11 + C)}][1 + \frac {(3 + C)(12 + 12C + C^2)} {n(2 + C)(11 + C)}]

where C is 4Nc where N is the effective population size and c is the recombination fraction between sites and n is the sample size (number of sampled chromosomes). This model assumes a low level of mutation and accounts for sample size.

The nonlinear model based on the expectations has one coefficient, the least squares estimate of 4Nc per bp distance between sites.

The curve is fitted using the scipy function scipy.optimize.curve_fit.