# Genotype Association Tests¶

## Genotype Association Tests Overview¶

The Genotype Association Tests window offers a straightforward way of testing for genotypic association against either case/control status or a quantitative trait using one or more statistical measures under any one of several genotype model assumptions.

In addition, for most genetic models, the Genotype Association Test window offers stratification correction using one or more of the following methods:

- Principal Component Analysis (the “EIGENSTRAT” method) (See
*Correction of Input Data by Principal Component Analysis*.) - Genomic Control (See
*Correcting for Stratification by Genomic Control*.)

Some tests have variations which use any missing data values you may
have in your genotypes as predictors. See the section
*Missing Values* for a discussion of the subject of including missing
values in tests performed by the Genotype Association Test dialog.

Finally, you may obtain overall marker statistics to be output along
with the association test results. (See *Overall Marker Statistics*.)

Note

For every individual marker, Golden Helix SVS will always display the number of sample data values which are actually used for testing that marker. Additionally, for case/control data, Golden Helix SVS will always display the number of case data values and control data values actually used for testing every individual marker.

Warning

Statistics calculated by this function do not adjust for gender and are therefore not always appropriate for non-autosomal chromosomes.

## Allele Classification for Genotype Association Tests¶

Golden Helix SVS provides the option for how to classify alleles. There are two possible options.

Alleles can be classified based on allele frequency. Based on the data in the spreadsheet the major and minor alleles are determined by allele frequencies.

Alleles can be classified based on reference or alternate allele status as specified by a marker map field. This allows the association testing of DNA-Sequence data where the reference alleles are known for variants and all tests need to be in terms of the alternate allele(s) or alternate sequence. The reference field should only contain information about the reference allele(s) or sequence.

Note

In the case that a variant is an insertion or deletion the reference “allele” is actually a sequence of alleles or the alternate “allele” is actually a sequence of alleles. For purposes of analysis a sequence of alleles is treated the same as an allele. It is either the “reference allele” sequence or the “alternate allele” sequence.

Note

Golden Helix SVS will always display the test allele used (minor allele or alternate allele), as well as the other allele involved in the test (major allele or reference allele).

## Genotype Models and Other Genotype Tests¶

Golden Helix SVS will perform tests based upon one genotype model or other grouping of genotype information. These models and other genotypic tests are as follows:

- Basic Allelic Tests
- Genotypic Tests
- Additive Model
- Dominant Model
- Recessive Model

These tests and models are described below.

### Basic Allelic Tests¶

For a basic allelic test, the genotypes dd, Dd, and DD (or rr, Ar, and AA) are resolved into pairs of alleles d and d, D and d, or D and D (or (r and r, A and r, or A and A). Both elements of each subject’s genotype are considered to correspond to the same value of the dependent variable. The associations with these individual alleles are then tested.

For example, examine the following case/control dependent variable and genotype variable columns. The allele frequency notation is used but the idea is the same for reference/alternate classification.:

Case/Control Genotype 0 d_d 1 D_d 1 D_D

These would be translated to:

Case/Control Allele 0 d 0 d 1 D 1 d 1 D 1 D

and the following quantitative phenotype dependent variable and genotype variable columns

Phenotype Genotype 0.6 d_d 2.9 D_d 1.7 D_D

would be translated to:

Phenotype Allele 0.6 d 0.6 d 2.9 D 2.9 d 1.7 D 1.7 D

The advantage of this test model is the number of observations has been doubled.

The disadvantage is that the genotype-specific information, such as which alleles are paired together, is ignored.

A further disadvantage of basic allele testing is that stratification correction through the Principal Components Analysis method is not available for this model.

### Genotypic Tests¶

“Genotypic Tests” refer to testing on the genotypes dd, Dd, and DD (or rr, Ar, and AA if classified according to reference vs. alternate) without regard to any “order” or allelic count or allelic pairing they might have.

These tests can reveal associations without regard to any specific genotype model. No associations are “hidden” because no model is assumed.

However, stratification correction through the Principal Component Analysis method is not available for this model.

### Additive Model¶

Under this model, testing is designed specifically to reveal associations which depend additively based on the allele classification.

If the alleles are classified according to allele frequency then the associations depend additively on the minor allele–that is, where having two minor alleles (DD) rather than having no minor alleles (dd) is twice as likely to affect the outcome in a certain direction as is having just one minor allele (Dd) rather than no minor alleles (dd).

If the alleles are classified according to reference or alternate allele or allele sequences the associations depend additively on the alternate allele sequence–that is, where having two alternate alleles (AA) rather than having no alternate alleles (rr) is twice as likely to affect the outcome in a certain direction as is having just one alternate allele (Ar) rather than no alternate alleles (rr).

Note

For a case-control response, two odds-ratio tests (see
*Odds Ratios with Confidence Limits*) are available under this model. These tests,
which are not really part of the additive model, as such, are not
only indicators of the intensity of any association, but are also a
check on the validity of the additive model itself in describing
the effect.

### Dominant Model¶

If the alleles are classified according to allele frequency then this model specifically tests the association of having at least one minor allele D (either Dd or DD) versus not having it at all (dd).

If the alleles are classified according to reference/alternate alleles then this model specifically tests the association of having at least one alternate allele A (either Ar or AA) versus not having it at all (rr).

### Recessive Model¶

If the alleles are classified according to allele frequency then this model specifically tests the association of having the minor allele D as both alleles (DD) versus having at least one major allele d (Dd or dd).

If the alleles are classified according to reference/alternate alleles then this model specifically tests the association of having the alternate allele A as both alleles (AA) versus having at least one reference allele r (Ar or rr).

## Test Statistics¶

Golden Helix SVS can perform or output results from the following statistical tests where appropriate:

- Correlation/Trend Test
- Armitage Trend Test
- Exact Form of Armitage Test
- (Pearson) Chi-Squared Test
- (Pearson) Chi-Squared Test with Yates’ Correction
- Fisher’s Exact Test
- Odds Ratio with Confidence Limits
- Analysis of Deviance
- F-Test
- Logistic Regression
- Linear Regression

These are described below.

### Correlation/Trend Test¶

This test (which is not available when missing values are used as predictors) is available for both case/control and quantitative dependent variables for every genetic model or test except the genotype model.

Also, this is the only test (besides logistic or linear regression)
which is available if Principal Components Analysis (PCA) is used for
stratification correction on the input data. See *Principal Component Analysis* for more
information.

This test will show the p-value for the (possibly PCA-corrected) dependent variable value having any correlation with, or “trend” which depends upon, the (possibly PCA-corrected) count value of the genotype. (See below.)

For case/control dependent variables, and before any PCA correction, a “case” is considered to have a value of one, and a “control” is considered to have a value of zero.

For the genotype predictor variable, its count values (before any PCA correction) are as follows:

- Alleles classified according to allele frequency:
**Additive Model:**The count of the minor allele D, which is zero within genotype dd, one within genotype Dd, and two within genotype DD, where d is the major allele.**Dominant Model:**The count is one for genotypes DD and Dd and zero for genotype dd.**Recessive Model:**The count is one for genotype DD and zero for genotypes Dd and dd.

- Alleles classified according to reference/alternate alleles:
**Additive Model:**The count of the alternate allele A, which is zero within genotype rr, one within genotype Ar, and two within genotype AA, where r is the reference allele.**Dominant Model:**The count is one for genotypes AA and Ar and zero for genotype rr.**Recessive Model:**The count is one for genotype AA and zero for genotypes Ar and rr.

In addition, this test will show a signed correlation value indicating the amount and direction of dependency of the (possibly PCA-corrected) count value on the (possibly PCA-corrected) dependent variable value.

Note

- This test yields p-value results very close to those obtained from the Armitage Trend Test, described below, in the special circumstance of an additive model where the dependent variable is case/control and no Principal Components Analysis correction is being done.
- The “Corr/Trend R” output from this test indicates the effect direction. (A positive direction means that a greater count of the minor or alternate allele, versus the major or reference allele, correlates with an increased effect.)

See the Formulas and Theories chapter for an explanation of this
statistic (*Correlation/Trend Test*).

### Armitage Trend Test¶

This test is available specifically under the additive model for a case/control dependent variable when missing data is dropped.

The test performed is on “case” versus “control” having a “trend”, which depends on the count of the minor/alternate allele D/A, which is zero within genotype dd/rr , one within genotype Dd/Ar, and two within genotype DD/AA.

Note

The “Armitage T” output () from this test indicates the effect direction. (A positive direction means that a greater count of the the minor or alternate allele, versus the major or reference allele, correlates with an increased effect.)

See the Formulas and Theories chapter for an explanation of this
statistic (*Armitage Trend Test*).

### Exact Form of Armitage Test¶

This test is available specifically under the additive model for a case/control dependent variable when missing data is dropped.

This exact test yields the probability under the null hypothesis of having a “trend” at least as extreme as the one observed, assuming an equal probability of any permutation of the dependent variable. This form, which is more computationally expensive than is the normal Armitage Trend Test, avoids the chi-square approximation used in that test.

Note

The “Armitage T (Observed)” output () from this test indicates the effect direction. (A positive direction means that a greater count of the the minor or alternate allele, versus the major or reference allele, correlates with an increased effect.)

See the Formulas and Theories chapter for an explanation of this
statistic (*Exact Form of Armitage Test*).

### (Pearson) Chi-Squared Test¶

The Pearson Chi-Squared test is available for a case/control dependent variable for all genetic models and tests except the Additive Model, and is available whether missing values are used or dropped.

This test is on the observed contingency table versus the expected contingency table created with all the possible variations of the selected model in one direction versus the case/control status in the other direction, keeping the margins constant.

The respective contingency tables and their dimensions when dropping missing values are as follows (allele frequency classification is used for demonstration purposes):

Genetic Model or TestContingency Table and DimensionBasic Allelic Test (Case/Control) vs. (D/d) a table Genotypic Test (Case/Control) vs. (DD/Dd/dd) a table Dominant Model (Case/Control) vs. ({DD or Dd}/dd) a table Recessive Model (Case/Control) vs (DD/{Dd or dd}) a table

If you have chosen **Use Missing Values As Predictors**, the respective
expanded contingency tables and their dimensions become as follows:

Genetic Model or TestContingency Table and DimensionBasic Allelic Test (Case/Control) vs. (D/d/missing–two missing values are used for every missing genotype) a table Genotypic Test (Case/Control) vs. (DD/Dd/dd/missing) a table Dominant Model (Case/Control) vs. ({DD or Dd}/dd/missing) a table Recessive Model (Case/Control) vs. (DD/{Dd or dd}/missing) a table

Note

This test additionally yields a “Correlation R” output when the Basic Allelic Test, the Dominant Model, or the Recessive Model is used, and when missing values are not being used as predictors. indicates the effect direction. (A positive direction means that the minor or alternate allele correlates with an increased effect versus the major or reference allele.)

See the Formulas and Theories chapter for an explanation of this
statistic (*(Pearson) Chi-Squared Test*).

### (Pearson) Chi-Squared Test with Yates’ Correction¶

The Pearson Chi-Squared test with Yates’ correction is available for a case-control dependent variable for all genetic models and tests except the Additive Model, and is available whether missing values are used or dropped.

Just as in the uncorrected Pearson Chi-Squared test, this test is on the observed contingency table versus the expected contingency table created with all the possible variations of the selected model in one direction versus the case/control status in the other direction, keeping the margins constant.

The respective contingency tables and their dimensions are the same as
for the uncorrected Pearson Chi-Squared test. Please see
*(Pearson) Chi-Squared Test*.

The difference between the two tests is that the Yates-corrected test subtracts 0.5 from the absolute magnitude of the difference between the observed and the expected value for each cell before squaring and dividing by the expected value. This correction, which almost always makes the result more conservative, is meant to compensate for the fact that discrete integer values rather than continuous values are used in the contingency table.

Note

This test additionally yields a “Correlation R” output when the Basic Allelic Test, the Dominant Model, or the Recessive Model is used, and when missing values are not being used as predictors. indicates the effect direction. (A positive direction means that the minor or alternate allele correlates with an increased effect versus the major or reference allele.)

See the Formulas and Theories chapter for a more detailed explanation
of this statistic (*(Pearson) Chi-Squared Test with Yates’ Correction*).

### Fisher’s Exact Test¶

The Fisher’s exact test is also available for a case/control dependent variable for all genotype models and tests except the Additive Model, and is available whether missing values are used or dropped.

This test yields the exact probability under the null hypothesis of having a contingency table at least as extreme as the one observed, assuming an equal probability of any permutation of the dependent variable. This test, which is more computationally expensive than the Pearson Chi-Squared test, avoids the chi-square approximation altogether.

See *(Pearson) Chi-Squared Test* above for a listing of the possible contingency
tables.

Note

This test additionally yields a “Correlation R” output when the Basic Allelic Test, the Dominant Model, or the Recessive Model is used, and when missing values are not being used as predictors. indicates the effect direction. (A positive direction means that the minor or alternate allele correlates with an increased effect versus the major or reference allele.)

See the Formulas and Theories chapter for an explanation of this
statistic (*Fisher’s Exact Test*).

### Odds Ratios with Confidence Limits¶

If you have a case/control dependent variable, you are dropping missing data, and you are using any model or test other than the Genotypic Test, you may select to output odds ratios and the lower and upper 95% confidence bounds for each under the following models:

Alleles classified according to allele frequency:

**Basic Allelic Tests**: The odds ratio for the minor allele enhancing the effect, and the odds ratio for the major allele enhancing the effect.**Dominant Model**: The “normal” odds ratio ({DD or Dd}/dd), where D is the minor allele and d is the major allele) and an inverse odds ratio (dd/{DD or Dd}).**Recessive Model**: The “normal” odds ratio (DD/{Dd or dd}) and an inverse odds ratio ({Dd or dd}/DD).**Additive Model**: The odds ratio for Dd/dd (heterozygous vs homozygous major allele) and the odds ratio for DD/Dd (homozygous minor allele vs heterozygous).Note

Under this model, the two odds ratios may be thought of as a check on the validity of the model itself in describing the effect, as well as indicators of the intensity of the association. If the two odds ratios are approximately the same, then the additive model may be considered valid. If the two odds ratios are very different, then there may be some other model better describing the data. For instance, a high and significant odds ratio for Dd/dd and a low or insignificant odds ratio for DD/Dd may indicate the dominant model more accurately describes the effect.

Alleles classified according to reference/alternate alleles:

**Basic Allelic Tests**: The odds ratio for the alternate allele enhancing the effect, and the odds ratio for the reference allele enhancing the effect.**Dominant Model**: The “normal” odds ratio ({AA or Ar}/rr), where A is the alternate allele and r is the reference allele) and an inverse odds ratio (rr/{AA or Ar}).**Recessive Model**: The “normal” odds ratio (AA/{Ar or rr}) and an inverse odds ratio ({Ar or rr}/AA).**Additive Model**: The odds ratio for Ar/rr (heterozygous vs homozygous reference allele) and the odds ratio for AA/Ar (homozygous alternate allele vs heterozygous).Note

Under this model, the two odds ratios may be thought of as a check on the validity of the model itself in describing the effect, as well as indicators of the intensity of the association. If the two odds ratios are approximately the same, then the additive model may be considered valid. If the two odds ratios are very different, then there may be some other model better describing the data. For instance, a high and significant odds ratio for Ar/rr and a low or insignificant odds ratio for AA/Ar may indicate the dominant model more accurately describes the effect.

Note

An odds ratio is generally considered significant if both the lower and the upper 95% confidence bounds are greater than one (or both less than one for an odds ratio less than one).

See the Formulas and Theories chapter for an explanation of this
statistic (*Odds Ratio with Confidence Limits*).

### Analysis of Deviance¶

This test is available for a case/control dependent variable for all genotype models and tests except the Additive Model, and is available whether missing values are used or dropped.

It is a first-order equivalent alternative statistic for testing an observed contingency table versus the expected contingency table. The test is created with all the possible variations of the selected model in one direction versus “case” or “control” status in the other direction.

See *(Pearson) Chi-Squared Test* above for a listing of the possible contingency
tables.

This test has somewhat more theory in its foundation than does the
Pearson Chi-Squared test (*(Pearson) Chi-Squared Test*) as it is a likelihood ratio
test, to which the Pearson test is a first-order approximation.

Note

See the Formulas and Theories chapter for an explanation of this
statistic (*Analysis of Deviance*).

### F-Test¶

This is one of the three tests available for a quantitative dependent
variable. (The other two are the correlation/trend test *Correlation/Trend Test*
and Linear Regression *Logistic/Linear Regression*.) The F-Test is available for
all genotype models and tests except the Additive Model, and is
available whether missing values are used or dropped.

It tests whether the distributions of the dependent variable within each category are significantly different between the various categories of the predictor variable.

The respective sets of categories when dropping missing values are as follows (classification of alleles by frequency used for demonstration purposes):

Genetic Model or TestCategoriesBasic Allelic Test D vs. d Genotypic Test DD vs. Dd vs. dd Dominant Model {DD or Dd} vs. dd Recessive Model DD vs. {Dd or dd}

If you have chosen **Use Missing Values As Predictors**, the respective
expanded sets of categories become as follows:

Genetic Model or TestCategoriesBasic Allelic Test D vs. d vs. missing–two missing values are used for every missing genotype Genotypic Test DD vs. Dd vs. dd vs. missing Dominant Model {DD or Dd} vs. dd vs. missing Recessive Model DD vs. {Dd or dd} vs. missing

Note

This test additionally yields a “Change in Dependent Average” output value, which will indicate the effect direction, when the Basic Allelic Test, the Dominant Model, or the Recessive Model is used, and when missing values are not being used as predictors. (A positive direction means that the average effect for the minor or alternate allele is higher than the average effect for the major or reference allele.)

See the Formulas and Theories chapter for an explanation of this
statistic (*F-Test*).

### Logistic/Linear Regression¶

When the dependent is a quantitative (real- or integer-valued) trait,
linear regression is available for every genetic model or test except
the genotypic model. With linear regression, a line is fit to the
response in terms of the predictor’s count value (see *Correlation/Trend Test*
above) according to the genetic model, and a p-value is computed for
goodness of fit. The output will include not only the regression p-value
but also the estimate for the intercept and slope of the regression.

When the dependent is a binary trait, logistic regression is available for every genetic model or test except the genotypic model. With logistic regression, a logistic (sigmoid) curve is fit to the predictor’s count value, and a p-value is computed for goodness of fit. The output will include not only the regression p-value but also the estimates for and .

Bonferroni and False Discovery Rate (FDR) multiple testing corrections can also be applied to the regression results.

See the Formulas and Theories chapter for an explanation of this
statistic (*Linear Regression* and *Logistic Regression*).

## Missing Values¶

### Using Missing Values for Genotypes¶

Your data may have missing values for some of the genotypes. The default
for association testing and stratification correction is to drop these
missing values. However, sometimes it is desirable to test wholly or
partly on “predictive missingness”, that is, what dependency the
response may have on missing values. If you wish to include missing
values in the predictions, check **Use Missing Values As Predictors**.

Note the available statistical tests which use missing values as predictors consist only of the following:

**Chi-Squared Test:**Takes a binary (case/control) dependent variable, see*(Pearson) Chi-Squared Test*.**Fisher’s Exact Test:**Takes a binary (case/control) dependent variable, see*Fisher’s Exact Test*.**Analysis of Deviance:**Takes a binary (case/control) dependent variable, see*Analysis of Deviance*.**F-Test:**Takes a quantitative dependent variable, see*F-Test*.

These test types do not impose anything resembling an “order” on the predictor values, and thus can work with missing data.

Note

No stratification correction is available when including missing data as predictors.

### Missing Values in the Dependent Variable¶

When you use a column containing missing values as the dependent variable, the rows containing these missing values in the dependent variable will not be used in association testing.

However, rows containing missing dependent values are still used in finding principal components and for obtaining genotype statistics.

### Importing Missing Values in a Case/Control Variable¶

If you have case/control data with some missing values, Golden Helix SVS version 7 and higher will import this column as “binary”. Versions before 7 imported this column as “integer”. This ensures all case/control association tests will be available for the non-missing values of dependent columns which contain missing values in their data.

## Multiple Testing Corrections¶

It may be possible to obtain a good test statistic value by chance alone. Multiple testing corrections are designed to help ensure, if possible, that this is not the case. You may optionally select one or more of the following multiple testing corrections.

### Bonferroni Adjustment¶

The Bonferroni adjustment, for each type of statistical test selected, multiplies each individual p-value by the number of times this type of test was performed on any marker. The resulting value, which is quite conservative, seeks to estimate the probability that a test of this type would have obtained the same value by chance at least once from all the times this type of test was performed.

The number of bi-allelic markers processed is used as the number of times this type of test has been performed. Tests of other types on the same markers are not counted in the Bonferroni adjustment used for this type of test.

### False Discovery Rate¶

The False Discovery Rate (FDR) option calculates the FDR for each statistical test selected. This test is based on the p-values from the original test.

A general interpretation of the FDR is “What would the rate of false discoveries (false positives) be if I accepted ALL of the tests whose p-value is at or below the p-value of this test?”

See the Formulas and Theories chapter for an explanation of this
correction procedure (*False Discovery Rate*).

### Permutation Testing¶

Permutation testing is another way of determining if a significant test statistic value was obtained by chance alone.

Note

- Permutation testing is available only for non-exact tests. (Exact tests already use permutation techniques.)
- Genomic control is not available concurrently with permutation
testing. Genomic control works directly on the chi-square results
of those tests which incorporate a chi-square statistic. (If you
did do permutation testing after applying genomic control, you
would get all of the same answers, because genomic control is
applied using a constant multiplier on all of the chi-square
values.) (See
*Correcting for Stratification by Genomic Control*.)

#### Single Value Permutation Testing¶

With single value permutations, the dependent variable is permuted and the given statistical test using the given model on the given marker is performed. This process is repeated the number of times you select (counting the original test as one “permutation”). The permuted p-value is the fraction of permutations in which the test came out as significant or as more significant than it did with the non-permuted dependent variable.

#### Full Scan Permutation Testing¶

The full-scan permutation technique differs from the single-value technique in that it addresses the multiple testing problem. It does this by comparing the original test result from an individual marker with the most significant permuted results from all tested markers. The specified number of permutations are done on the dependent variable and these permutations are tested with each marker. For each permutation only the most significant result statistic of all markers tested with that permutation is saved.

The p-value is the fraction of permutations in which this best saved value of the test statistic was more significant than the original statistical test on the given marker.

See the Formulas and Theories chapter for a more detailed explanation
and examples of permutation testing. (*Permutation Testing Methodology*).

## Stratification Correction¶

### Principal Components Analysis¶

To correct for stratification, batch effects, or other measurement
errors, you may choose to have Golden Helix SVS apply Principal Component
Analysis (PCA) to your input data as a part of the process of testing
it for associations. The corrected data, which you may request to be
output into a separate spreadsheet, is the same as that which could be
created through the separate PCA window. (See
*Correction of Input Data by Principal Component Analysis* and *Using the Genotypic Principal Components Analysis Window*.)

### Genomic Control¶

Genomic control is an alternative method that you may use for
stratification correction. Here, an “inflation factor” is either
inferred or externally specified. This “inflation factor” indicates
how much the distribution of statistics from the association tests is
spread out from what it should be, and will result in p-values that
are corrected to be more realistic (larger) than the original test
results. (See *Correcting for Stratification by Genomic Control*.)

## Overall Marker Statistics¶

Several types of overall marker statistics and genetic measures may be
output along with genotype association test results. These marker
statistics are the same as the ones obtained through the separate
Genotype Statistics by Marker window, and are detailed in the section
*Genotype Statistics by Marker*.

## Using the Genotype Association Test Window¶

Summary information for the dependent variable and the currently selected genotype model is displayed at the top of this window for reference. This information is visible from all three tabs in this dialog window.

### Data Requirements¶

Genotype Association Tests require a dataset containing genotype data
and either case/control or quantitative trait data. To use these tests,
first import your data into a Golden Helix SVS project (See
*Importing Your Data Into A Project*.) Once you have the spreadsheet for this data, select the
column representing the case/control status or quantitative trait as the
dependent variable (See *Column States*) and access the **Genotype
Association Tests** options dialog by selecting **Genotype** >
**Genotype Association Tests** from the spreadsheet menu.

Note

- It is common practice to inactivate those markers known to have data quality issues before testing, especially if you wish to use PCA.
- If you have case/control data with some missing values, see
*Importing Missing Values in a Case/Control Variable*. You can still analyze it as case/control data.

### Available Tabs¶

The genotype association test window consists of three tabs:

**Association Test Parameters:**This tab contains all the parameters necessary for the association tests themselves, plus options for selecting principal component analysis for stratification correction of the test input data.**PCA Parameters:**This tab contains all of the remaining parameters for principal component analysis (PCA).Note

These parameters are also available in the stand-alone Genotype Principal Component Analysis window. If you wish to perform principal component analysis on your data without performing an association test, see

*Using the Genotypic Principal Components Analysis Window*.**Overall Marker Statistics:**This tab contains the parameters for obtaining overall marker statistics. These statistics are independent of any association test, other than the fact that most of these statistics will subdivide their results by overall, cases, and controls if a single case/control variable is the dependent variable. If Genotype Counts is selected and the dependent variable is quantitative, then the average value for each genotype will be computed.Note

These parameters are also available in the stand-alone Genotype Statistics by Marker window, see

*Genotype Statistics by Marker*.

### The Association Test Parameters Tab¶

In the **Association Test Parameters** tab (see
*Genotype Association Tests – Association Test Parameters Tab Allele
Frequency Classification* and *Genotype Association Tests – Association Test Parameters Tab Reference
and Alternate Allele Classification*),
select the allele classification,
select the one genetic model or other test
you wish to use, select whether to include missing values in the
analysis, select whether you wish to correct your input data for
stratification through PCA or through genomic control, and select all
of the statistical tests you wish to perform.

Optionally you may select multiple-testing corrections to perform for the non-exact statistical tests or to correct for stratification through Genomic Control.

Note

The inflation factor will be displayed in the Node Change Log for the Association Results spreadsheet.

This user interface is dynamic. Making certain choices will change the availability or selections available for other choices. Specifically, the following restrictions apply:

- Selecting your allele classification, genetic model, whether to use missing values, and whether to correct your input data through PCA will alter your selection of statistical tests which are available.
- PCA is not available for basic allele tests or genotype tests.
- The additive model is not available when using missing data as predictors.
- Genomic control is not available when using missing data as predictors.
- PCA is not available when using missing data as predictors.
- Genomic control is not available at the same time as permutation testing.
- Genomic control is not available for the genotype model when the dependent variable is quantitative.

If an option is hidden, grayed out or inaccessible, it means a different option or options you have previously selected will not allow the option which is hidden, grayed out, or inaccessible to be simultaneously selected.

Single Value Permutations and Full Scan Permutations can be run individually or together. You must provide a value for the number of permutations used in the test. When running both types of permutations together, the selected number of permutations is the same for both. The number of permutations should be greater than or equal to three. Permuted P-Values are calculated only for non-exact test statistics.

### The PCA Parameters Tab¶

If you selected to correct for stratification with PCA, you will be
able to select PCA parameters from this tab (see
*Genotype Association Tests – PCA Parameters Tab Allele Frequency
Classification* and *Genotype Association Tests – PCA Parameters Tab Reference and Alternate
Allele Classification*).

The principal components can be computed, or if they have already been
computed for the dataset, the spreadsheet of principal components can be
selected after selecting the “Use precomputed principal components”
option. See *Applying PCA to a Superset of Markers* and *Applying PCA to a Subset of Samples* for specific
limitations of this feature.

The other options include the number of components to be found,
normalization method, which, if any, spreadsheets to output, and whether
and how to eliminate component outlier subjects and recompute
components. See *Principal Component Analysis* for an explanation of the options for this
tab.

Note

- The genetic model and allele classification, selectable in the Association Test Parameters tab, is also a parameter which influences finding the principal components.
- Correcting a binary dependent variable makes it continuous, and thus linear regression and the Correlation/Trend Test are the appropriate tests in this situation for those genetic models for which PCA correction is available.

### The Overall Marker Statistics Tab¶

Here, you can optionally select to output any of the overall marker
statistics available in this tab (see *Genotype Association Tests – Overall Marker Statistics Tab Allele
Frequency Classification*
and *Genotype Association Tests – Overall Marker Statistics Tab Reference
and Alternate Allele Classification*). See
*Genotype Statistics by Marker* for an explanation of the options for genotype
marker statistics.

### Processing¶

When you have selected all the tests and outputs you wish to perform,
select the **Run** button to start the selected tests and correction
procedures. While the association test analysis itself is running, you
can press the **Cancel** button on the progress bar dialog to stop the
analysis.

When the tests are completed the output spreadsheet(s) will appear.

### Spreadsheet Outputs¶

These can be as follows:

The results of the association tests and marker statistics will be displayed in the same spreadsheet. Each of the statistics calculated will be in its own column. If the original dataset was a marker mapped spreadsheet, this spreadsheet will have the rows marker mapped.

Note

The skipped markers will be excluded in this spreadsheet.

If you requested an output spreadsheet of the PCA-corrected input data, this will be created. The PCA correction of the dependent variable will also be shown.

If you requested a principal components spreadsheet, this will be created with rows according to the patient or subject and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components requested will be shown.

If you requested an eigenvalue spreadsheet from PCA, it will simply show the eigenvalues from large to small (of the number of components specified).

If you requested elimination of outlier subjects, and outliers were found, a spreadsheet will be made to list these outliers and the iteration and component in which they were found.

Note

If you wish to see any outputs in the form of p-value-style
plots, see *Numeric Value Plot* for genomic scale value plots
and *Uniform Numeric Value Plot* for uniform scale value plots.

## LD Score Regression¶

This feature takes GWAS Association Test Results and LD Scores to calculate heritability, genetic covariance, and genetic correlation.

It recommended to perform quality control before running this feature by either filtering by imputation quality and/or minor allele frequency.

Note

Please see https://github.com/bulik/ldsc/ for more information on the ldsc package.

Also see the blog post Understanding Your GWAS Signal with LD Scores for background of when to use this feature.

Before running this feature, first import a “ldscore” file from LDSC
using the *Import LDSCORE Output* feature from the Import menu or
download one the following LDSC precomputed
files using the Import > Public Data menu:

- EAS LD Scores (East Asian)
- EUR LD Scores (European)

Alternatively, you may compute LD Scores from your genotype
spreadsheet using the feature *LD Score Computation and Binning*.

Join the imported LDScore file with your GWAS test results spreadsheet and then run this feature.

**Compute Heritability estimate only**: This will only compute the heritabilty of the statistic in the spreadsheet this feature was run from.**Compute Genetic Correlation with additional traits**: This will compute heritability on each spreadsheet and then compare the first spreadsheet with each subsequent spreadsheet and compute the genetic covariance and correlation.

If we choose **Compute Heritaility estimate only** the following options will be
available.

**LD Fields**: The linkage disequilibrium column.**Sample Size**: The number of SNPs included in the GWAS test.**Missing Genotype Column**: A column that will subtract off the number of missing SNPs from the**Sample Size**.**Statistics Input**: The statistic produced from a GWAS study.

If we choose **Compute Genetic Correlation with additional traits** the
following spreadsheet selection dialog.

Additional trait spreadsheets (spreadsheets with GWAS results) can be selected here.

The next dialog will pertain to the original spreadsheet, the spreadsheet this feature was selected from.

The options will be the same as the heritability only option, but there will also
be an additional **Join Field** option.

This field should be a field in common across all spreadsheets, such as an RSID field.

The subsequent dialog(s) will have options for each additional trait spreadsheet.

These dialogs will have the same options as the previous dialog except there is no need to supply an LD column.

After all these options have been set, the ldsc script will run and a result viewer
will be created. If **Compute Heritability estimate only** was selected, then the
result will just contain the heritability of the selected statistic.

If **Compute Genetic Correlation with additional traits** was selected, then we’ll
get the heritability of each trait and the genetic covariance and correlation
between the first spreadsheet and each additional spreadsheet.