# 2.9. Principal Component Analysis¶

## 2.9.1. Principal Component Analysis Overview¶

### Correcting for Stratification¶

Sometimes finding an association can be confounded by population stratification. This is because a condition may be more prevalent in one group of people than in a different group, resulting in a spurious association between the condition or trait being tested for and any genetic characteristics which vary between the two different groups of people.

While it is good practice for studies to be based on as homogeneous a group of test subjects as possible, it has been noted in [Price2006] that even the mild variation in genetic characteristics among those who classify themselves as belonging to one ethnic group or another can be problematic enough to confound a study done over thousands of genetic markers.

Note

Hidden population stratification may be thought of as a non-zero between unknown groupings of samples.

Please see Overview of F-Statistics for a further discussion of F-statistics.

Two methods are available in SVS for correction for population stratification:

Correction of Input Data by Principal Component Analysis, and

Correction of Output Chi-Square Values and P-Values by Genomic Control (see Correcting for Stratification by Genomic Control)

### Correcting for Batch Effects and Other Measurement Errors¶

It is noted in [Price2006] that there is evidence that variations in test equipment can easily introduce bias into studies.

Additionally, with Copy Number Variation (CNV) studies, variation between batches (batch effects) may easily affect the association test results. Principal Component Analysis may be used to correct for measurement variations in the input data, as well as for population stratification.

### Correction of Input Data by Principal Component Analysis¶

This technique, pioneered at the Broad Institute, distributes a program called “EIGENSTRAT” which implements the PCA correction technique, as described in [Price2006]. A more thorough discussion of stratification, principal component analysis, and the eigenvalues involved may be found in [Patterson2006].

Note

The technique implemented in SVS uses similar methodology as in the “EIGENSTRAT” program, with several enhancements.

#### Working Premise¶

If there is population stratification or there are batch effects, this pattern of variation over the samples should manifest within, or influence to a greater or lesser degree, the data of many different markers. By contrast, any association with the phenotype in question should manifest only over one or very few markers.

If one can find a way to remove any pattern(s) resulting from stratification or batch effects and test only the data without these patterns, the influence from the stratification or batch effects should be eliminated or minimized.

#### The PCA Technique¶

The object is to first determine the different patterns of variation over the subjects from within the markers, then to remove these patterns.

The following procedure is followed for each marker:

The average value for the marker across all samples is determined.

The average value is subtracted from all values for the marker to recenter the data about zero. The result is the total pattern of variation for the marker.

This pattern might be “normalized” or divided by a value determined from the marker’s data. The possibilities for normalization in SVS are:

The theoretical standard deviation of the marker’s data at Hardy-Weinberg Equilibrium (HWE). This is what the standard deviation of the data for the marker would be over the entire population if the marker in the population were in HWE and had the same major and minor (or reference/alternate) allele frequencies as observed for the marker.

The actual standard deviation of the data for the marker.

No normalization.

These (possibly normalized) individual marker patterns over the subjects are then “summed up” into an matrix, referred to as a “Wishart” or “Wishart-like” matrix, or just as .

Note

The effects of normalizing or not normalizing the pattern data are simply to weight the contributions of the various markers to the Wishart matrix differently.

The “eigenvectors” and “eigenvalues” of this matrix are called its “components”, and the “eigenvectors” corresponding with its largest “eigenvalues” are called its “principal components”.

It so happens that (according to the working premise of the “EIGENSTRAT” PCA technique) the first principal component or the first few principal components correspond directly to the stratification pattern(s), based on the prevalence of this pattern or these patterns of stratification over so many markers.

Therefore, the final two steps of stratification correction through the “EIGENSTRAT” PCA technique are to find the top principal components (where you select ) and remove these component/patterns from both the marker data and the dependent variable using a vector-analysis related technique.

For a more precise and “mathematical” explanation of this process, please see Formulas for Principal Component Analysis.

#### How Many Components to Use?¶

This is, to some degree, an open question as it is a matter of personal preference. Note that if you choose as many components as there are markers, you will wind up subtracting out ALL effects, thus getting nothing from your tests. The maximum number of markers that can be selected is the number of samples less one. If a larger number is selected, then the number of principal components found will still be the number of samples less one.

The best answer consists of first simply obtaining the components themselves and their corresponding eigenvectors for principal components, where is the total number of samples in the dataset. This can be done by either running uncorrected tests or from the separate Principal Component Analysis windows in the Quality Assurance menu from a spreadsheet. See Genotypic Principal Component Analysis or Numeric Principal Component Analysis.

Then evaluate the pattern of the eigenvalues in the spreadsheet or plot the eigenvalue variable column. (Right-click on the column header and select Plot Variable.) If the first few are very large compared with the remaining eigenvalues, then use that many components in a second analysis. Another way to determine the number of components to use is to examine cluster plots of the first few eigenvalues plotted against each other, i.e. one versus two, two versus three, and so on. (See Multi-Color Scatter Plots for PCA or Gender Analysis.) When there are no longer discernible clusters of groups based on phenotype categories, then that is the number of components to use for PCA correction.

#### A Second Answer (for Genotypic Data Only)¶

The inflation factor can also be used to adjust for stratification. An inflation factor close to one would indicate stratification is not present in the data. The number of principal components can be determined by increasing or decreasing the number of components until the inflation factor is as close to one as possible. Using the inflation factor to determine the number of principal components to use is a variant of Genomic Control. See Correcting for Stratification by Genomic Control for more information.

#### Removing Outlier Subjects¶

If you are interested in what the principal components are, or wish to be more sure you have corrected for stratification through PCA, there is the option of repeating the determination of the principal components after removing patients or subjects from analysis who are found to have extreme values in one or more of the previously-determined principal components.

Golden Helix SVS automates this process. Select the number of principal components that should be involved in this process, how many standard deviations in all of these components a patient or subject should be within, and how many times to repeat this process. The final testing and principal components spreadsheet will avoid using any of the outlier subjects, which will be logged in a separate spreadsheet.

#### PCA-Corrected Association Testing¶

The only tests available, once you have corrected the input data for
stratification using Principal Component Analysis, are the
**Correlation/Trend Test** (genotypic Correlation/Trend Test and numeric
Correlation/Trend Test) and **Logistic/Linear Regression** (genotypic
Logistic/Linear Regression and numeric Linear/Logistic Regression).

(The exception to this is that for numeric association tests
on a binary dependent variable, where the dependent variable is not
corrected through PCA but only the predictors are corrected, the
**T-Test** is additionally available.)

This is because the **Correlation/Trend Test** and **Logistic/Linear
Regression** are based on numeric values for both the predictor and
the response. Thus, these tests may still be run, even if genotype
designations or case/control designations have been transformed from
simple categorical numeric designations such as zero, one, or two to
real numbers that may take on any necessary values.

## 2.9.2. Genotypic Principal Component Analysis¶

### Data Requirements¶

Genotypic principal components analysis requires a dataset containing
genotypic data. First import genotypic data into an SVS
project (see Importing Your Data Into A Project). Once a spreadsheet has been created in a
project with genotypic data, access the Genotypic Principal Component
Analysis options dialog by selecting **Genotype** >
**Genotypic Principal Component Analysis** from the spreadsheet menu.

Before principal components can be computed for genotypic data, the allele classification must be specified. The options are to classify alleles based on allele frequency (major/minor alleles) or by reference/alternate alleles as indicted by a marker map field that indicates the reference allele. Based on the classification of alleles a numeric equivalent to each genotype is established, depending on whether the additive, dominant, or recessive model is selected for the genetic model. (The PCA technique is not available for the basic allelic test or the genotypic test, since establishing a numeric equivalent is not as straightforward or even possible for those models.)

Note

It is common practice to inactivate markers that are known to have data quality issues before using principal components analysis.

### Using the Genotypic Principal Components Analysis Window¶

Note

This window (see Genotype Principal Component Analysis) essentially accomplishes
the functions of the PCA Parameters tab from the Genotype Association
Tests dialog obtained through the Analysis menu when **Correct for
Stratification with PCA** is selected on the Association Test
Parameters tab. The only difference is that it is not necessary to
simultaneously perform an association test to use the separate
Genotype Principal Component Analysis window.

### Processing¶

The principal components can be computed, or if they have already been computed for the dataset, the spreadsheet of principal components can be selected after selecting the “Use precomputed principal components” option. See Applying PCA to a Superset of Markers and Applying PCA to a Subset of Samples for specific limitations of this feature.

Select the PCA parameters - specifically, classification of alleles, the genetic model, maximum number of components to find and correct for, normalization method, spreadsheets to output, and whether to eliminate component outlier subjects. When PCA outlier removal is performed by recomputing components, there are selections for the number of times to recompute the components, the criteria for determining an outlier, and the number of components to remove outliers from. See Correction of Input Data by Principal Component Analysis for more information on the options on this dialog.

Select the **Run** button to perform the analysis.

When the analysis is complete, a message indicating the number of markers analyzed and the number of markers that were skipped will be appended to the Node Change Log for each output spreadsheet, and all spreadsheets selected for output will be opened.

### Spreadsheet Outputs¶

The possible output spreadsheets are as follows:

The corrected input data. (Recall that genotypic data is first converted to numeric data by the selected genetic model.)

The principal components spreadsheet with rows according to the sample and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components requested will be shown.

The eigenvalue spreadsheet will simply show the eigenvalues from greatest to smallest (of the number of components requested).

If re-computation of components after removal of outliers was selected, and outliers were found, then a spreadsheet will be created to list these outliers and the iteration and component in which they were found.

Note

If you wish to plot any outputs (such as the one column in the eigenvalue spreadsheet) select

**Plot Numeric Values**from the**Plot**menu. The column of data will be plotted against the row labels or eigenvalue number.If you wish to plot a principal component eigenvector against another eigenvector, select

**XY Scatter Plots**from the**Plot**menu. See Multi-Color Scatter Plots for PCA or Gender Analysis or N by N Scatter Plots for more information.

## 2.9.3. Numeric Principal Component Analysis¶

### Data Requirements¶

Numeric principal components analysis requires a dataset containing
numeric data. First import numeric data into an SVS project
(see Importing Your Data Into A Project). Once a spreadsheet has been created in a project with
numeric binary, integer, and/or real-valued data, access the Numeric
Principal Component Analysis options dialog by selecting **Numeric** >
**Numeric Principal Component Analysis** from the
spreadsheet menu.

Note

It is common practice to inactivate markers known to have data quality issues before using principal components analysis.

### Using the Numeric Principal Components Analysis Window¶

Note

This window (see Numeric Principal Component Analysis) essentially
accomplishes the functions of the PCA Parameters tab from the Numeric
Association Tests dialog obtained through the Analysis menu when
**Correct for Batch Effects/Stratification with PCA** is selected on
the Association Test Parameters tab. The only difference is that it
is not necessary to simultaneously perform an association test to use
the separate Numeric Principal Component Analysis window.

### Processing¶

The principal components can be computed, or if they have already been computed for the dataset, the spreadsheet of principal components can be selected after selecting the “Use precomputed principal components” option. See Applying PCA to a Superset of Markers and Applying PCA to a Subset of Samples for specific limitations of this feature.

Select the PCA parameters - specifically, the maximum number of components to find and correct for, the spreadsheets to output, whether to center by marker or by sample or by both marker and sample, and whether to eliminate component outlier subjects. When PCA outlier removal is performed by recomputing components, there are selections for the number of times to recompute the components, the criteria for determining an outlier, and the number of components to remove outliers from.

For new projects, data centering by marker is normally recommended. Data centering by sample may be useful for components that are applied to chromosomes or regions other than those from which they are originally computed. For more information on these options, see Centering by Marker vs. Not Centering by Marker and Centering by Sample vs. Not Centering by Sample.

See Correction of Input Data by Principal Component Analysis for more information on other options in this dialog.

Select the **Run** button to perform the analysis.

When the analysis is complete, a message indicating the number of components found and the number of predictors analyzed will be appended to the Node Change Log for each output spreadsheet. All spreadsheets selected for output will be opened.

### Spreadsheet Outputs¶

The possible output spreadsheets are as follows:

The corrected input data. (Recall that genotypic data is first converted to numeric data by the selected genetic model.)

The principal components spreadsheet with rows according to the sample and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components requested will be shown.

The eigenvalue spreadsheet will simply show the eigenvalues from greatest to smallest (of the number of components requested).

If recomputing components after removal of outliers was selected, and outliers were found, then a spreadsheet will be created to list these outliers as well as the iteration and component in which they were found.

Note

If you wish to plot any outputs (such as the one column in the eigenvalue spreadsheet) select

**Plot Numeric Values**from the**Plot**menu. The column of data will be plotted against the row labels or eigenvalue number.If you wish to plot a principal component eigenvector against another eigenvector, select

**XY Scatter Plots**from the**Plot**menu. See Multi-Color Scatter Plots for PCA or Gender Analysis or N by N Scatter Plots for more information.

## 2.9.4. Finding Principal Components When the Sample Size is Large¶

When the sample size gets to be over 4,000, finding the principal components using Genotypic Principal Component Analysis or Numeric Principal Component Analysis will show a noticable delay, with 14,000 samples taking on the order of 12 minutes for principal component processing and 28,000 samples taking more than an hour. Finally, there is an ultimate limit for sample size for these features to function at all, which is just over 40,000. This can take several hours to finish the total processing.

To address this issue, a feature has been added that will find a few principal components when the sample size is large or very large. This new feature has been used on datasets containing up to 400,000 samples.

This feature may be accessed from either a spreadsheet with mapped
genotypic data or from a spreadsheet with mapped numeric data. In either
case, use **Genotype > Compute Large Data PCA** from the spreadsheet
menu. The appropriate user interface will come up according to your
data type.

### Basic Concept¶

As indicated in Technique, obtaining principal components from genotypic or numeric data requires two phases:

A Wishart matrix (see Further Motivation: Relation to the Variance-Covariance Matrix for the deep math) must be generated.

(In Genotypic Principal Component Analysis and Numeric Principal Component Analysis, this matrix, which is generated when you see the indication of progress scanning the data, is never output.)

The actual eigenvalues and eigenvectors are computed.

(In Genotypic Principal Component Analysis and Numeric Principal Component Analysis, there is only a status dialog shown during this phase. Internally, Genotypic Principal Component Analysis and Numeric Principal Component Analysis compute all of the eigenvalues and eigenvectors before outputting those that are requested, even if only a very few eigenvalues and eigenvectors are requested.)

For large-data PCA, (1) the Wishart matrix is not only generated, but output, (2) the Wishart matrix is stored on hard drive, with no attempt made to store the entire matrix in RAM memory, and (3) only the eigenvalues and eigenvectors that are requested are actually computed. For the details of how these requested eigenvalues and eigenvectors are computed, please see Large Kinship Matrices or Large Numbers of Samples.

### Data Requirements for Genotypic Data¶

For large-data principal components analysis on genotypic data, the data requirements are the same as those stated in Data Requirements, except that (1) only mapped genotypic data will be used, and (2) allele classification will always be based on allele frequency (major vs. minor allele).

### Using the Large Data PCA Window for Genotypic Data¶

The following window (Principal Component Analysis of Genotypes–Large Data Approach) displays when you
request **Genotype > Compute Large Data PCA** with genotypic data.

#### Step 1: Compute the Wishart Matrix¶

If you need to compute the Wishart Matrix, leave *STEP 1: COMPUTE THE
WISHART MATRIX* checked. In this left-hand box, select the following:

*PCA Marker Normalization*, or whether and how data is normalized, or re-scaled, before it is added into the Wishart Matrix.*Theoretical standard deviation under HWE*What the standard deviation of the marker data, as numerically recoded, would be, based on the marker’s allele frequencies, assuming the marker is in Hardy-Weinberg equilibrium.*Actual standard deviation*The actual standard deviation of the marker data as numerically recoded.*Apply no normalization*Just use the numerically-recoded data as is.

*Genetic Model for PCA*What numeric recoding to use for the genotypic data. In the following, ‘d’ is the major allele, while ‘D’ is the minor allele.*Additive model: (dd)->(Dd)->(DD)**Dominant model: (DD,Dd) vs. (dd)**Recessive model: (DD) vs. (Dd, dd)*

*Impute missing genotypic data as:**Homozygous major allele*Basically, this substitutes zero for missing values in the numeric recoding.*Numerically as average value*The non-missing numerically recoded data is averaged over the current marker, and that average is substituted for every missing value in the current marker.

Note

In Genotypic Principal Component Analysis and Numeric Principal Component Analysis, missing values are simply dropped. This may produce some differences in results when there is missing data.

Note

This step will run faster when you have more RAM dedicated
to this process. Specifically, use *Transpose and analysis memory
usage* under **Tools > Global Product Options** to change this
setting. (This option must be accessed when you are not in any
project.)

Meanwhile, if you already have a pre-computed Wishart Matrix, uncheck
*STEP 1: COMPUTE THE WISHART MATRIX* and, under *STEP 2: FIND THE
PRINCIPAL COMPONENTS*, check *Use Pre-Computed Wishart Matrix* and
select your pre-computed Wishart Matrix.

#### Step 2: Find the Principal Components¶

Note

If you only need to pre-compute a Wishart Matrix for later
use, uncheck *STEP 2: FIND THE PRINCIPAL COMPONENTS*.

This step does the actual eigenvalue and eigenvector computation. An approach based on an algorithm specified in [Halko2010] is used. For the algorithm details, please see Large Kinship Matrices or Large Numbers of Samples.

*Use Pre-Computed Wishart Matrix*As stated above, if the Wishart Matrix has already been computed and you wish to use it, uncheck*STEP 1: COMPUTE THE WISHART MATRIX*and check this box, selecting the Wishart Matrix you wish to use.*Number of Principal Components*/*Find __ Principal Components*Select how many principal components (eigenvectors and eigenvalues) to compute. This algorithm is optimized for computing just a few of the eigenvector/eigenvalue pairs that have the highest eigenvalues. The remaining eigenvector/eigenvalue pairs do not explicitly enter into (or bog down) the computation.*Halko Algorithm Random Number Initialization*The [Halko2010] algorithm incorporates pseudo-random numbers that need to be generated. This pseudo-random number generator has to be initialized with a “seed” value from which it can generate the pseudo-random numbers.*Fixed random seed (reproducible results)*This is the default choice. With this choice, one fixed random seed (specifically 12345678) is always used. If the program must be re-run, using this choice a second time (with all other parameters being the same) will reproduce the exact same results as the original run.*Dynamic random seed (investigate precision)*Use this choice (on a second or later run) to obtain a gauge of algorithm accuracy.“Dynamic” refers to the fact that a different random seed is used every time a run is made using this choice. Due to the differing random seeds, the results will differ between runs, both between a “Fixed” run and any “Dynamic” run, and between any two “Dynamic” runs. These differences are a gauge of the actual algorithm accuracy, which will be proportional to these differences. As a rule of thumb, the actual error will almost always be within 10 times the amount of the differences between the “Fixed” results and a set of “Dynamic” results or between two sets of “Dynamic” results.

Note

The components that are more “Principal”, that is, those
that have the higher eigenvalues, will always have a higher level
of accuracy with this algorithm. For instance, the default *Number
of Principal Components* of 15 is meant to achieve a good level of
accuracy for the highest 10 components, while only achieving a
reasonable level of accuracy for the other 5 components.

### Spreadsheet Outputs¶

The possible output spreadsheets are as follows:

The Wishart Matrix. This matrix is in the same format as a pre-computed kinship matrix (see Precomputed Kinship Matrix Option).

The principal components spreadsheet with rows according to the sample and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the components that were requested and computed will be shown.

Note

This format is the exact same format that is produced by Genotypic Principal Component Analysis (and by Numeric Principal Component Analysis).

The eigenvalue spreadsheet will simply show the eigenvalues from greatest to smallest (of the number of components requested and computed).

Note

This format is also the exact same format that is produced by Genotypic Principal Component Analysis (and by Numeric Principal Component Analysis).

### Data Requirements for Numeric Data¶

For large-data principal components analysis on numeric data, the data requirements are the same as those stated in Data Requirements, except that only mapped numeric data will be used for principal components analysis.

Also note that this feature will always center the data by marker (and not center the data by sample).

### Using the Large Data PCA Window for Numeric Data¶

The following window (Principal Component Analysis of Numeric Data–Large Data Approach) displays when you
request **Genotype > Compute Large Data PCA** with numeric data.

#### Step 1: Compute the Wishart Matrix¶

Step 1 for numeric data is almost, but not quite, the same as Step 1 for genotypic data.

If you need to compute the Wishart Matrix, leave *STEP 1: COMPUTE THE
WISHART MATRIX* checked. In this left-hand box, select the following:

*PCA Marker Normalization*, or whether and how data is normalized, or re-scaled, before it is added into the Wishart Matrix.*Theoretical standard deviation under HWE (*)*What the standard deviation of the data would be, based on assuming the data is genotypic marker data which has been numerically recoded using the additive model, deriving the major and minor allele frequencies from that assumption, then finding what the standard deviation would be for those frequencies if the marker were in Hardy-Weinberg equilibrium.

Note

Choosing this normalization will require the input data to be integer (or binary) and only have values of 0, 1, or 2.

*Actual standard deviation*The actual standard deviation of the numeric data.*Apply no normalization*Just use the numeric data as is.

Note

Any numeric data will be considered valid if either of the above two normalization options (

*Actual standard deviation*or*Apply no normalization*) is chosen.Note

*Apply no normalization*will process the data the same as would Numeric Principal Component Analysis using*Center data by marker*(and not using*Center data by sample*).*Impute missing data as:**Zero (homozygous major allele *)*This substitutes zero for missing values in the numeric data.*Numerically as average value*The non-missing data is averaged over the current position/marker, and that average is substituted for every missing value in the current position/marker.

Note

In Genotypic Principal Component Analysis and Numeric Principal Component Analysis, missing values are simply dropped. This may produce some differences in results when there is missing data.

Note

This step will run faster when you have more RAM dedicated
to this process. Specifically, use *Transpose and analysis memory
usage* under **Tools > Global Product Options** to change this
setting. (This option must be accessed when you are not in any
project.)

Meanwhile, if you already have a pre-computed Wishart Matrix, uncheck
*STEP 1: COMPUTE THE WISHART MATRIX* and, under *STEP 2: FIND THE
PRINCIPAL COMPONENTS*, check *Use Pre-Computed Wishart Matrix* and
select your pre-computed Wishart Matrix.

#### Step 2: Find the Principal Components¶

Note

If you only need to pre-compute a Wishart Matrix for later
use, uncheck *STEP 2: FIND THE PRINCIPAL COMPONENTS*.

Step 2 for numeric data, which does the actual eigenvalue and eigenvector computation, is exactly the same as Step 2 for genotypic data. Please see Step 2: Find the Principal Components.

### Spreadsheet Outputs¶

The possible output spreadsheets are as follows:

The Wishart Matrix. This matrix is in the same format as a pre-computed kinship matrix (see Precomputed Kinship Matrix Option).

The principal components spreadsheet with rows according to the sample and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the components that were requested and computed will be shown.

Note

This format is the exact same format that is produced by Numeric Principal Component Analysis (and by Genotypic Principal Component Analysis).

The eigenvalue spreadsheet will simply show the eigenvalues from greatest to smallest (of the number of components requested and computed).

Note

This format is also the exact same format that is produced by Numeric Principal Component Analysis (and by Genotypic Principal Component Analysis).

## 2.9.5. Correcting for Stratification by Genomic Control¶

This somewhat older method, pioneered by Devlin and Roeder [DevlinAndRoeder1999], notes that the chi-squared distribution of statistics from association tests being confounded by stratification will be more “spread out” than they should be. The result is a higher median than the median of a true chi-square distribution. Several models exist for how much the distribution should be spread out, depending on the test type, but the distribution will usually be uniformly spread out by a certain “inflation factor” .

This technique of Genomic Control measures this “inflation factor” by taking the median of the distribution of the chi-square statistic from results of an actual test done over a set of markers from the study in question, and dividing this median by the median of the corresponding (ideal) chi-square distribution. If the result is less than one, the distribution is considered close enough to ideal and is taken to be one.

Then, Genomic Control applies its correction by dividing the actual association test chi-square statistic results by this , thus possibly making these results appropriately more pessimistic.

Two approaches exist for this:

Measure the “inflation factor” over a set of markers designed to indicate population stratification. Then, use this on the actual association test (presumably done for just a few candidate markers). (For studies over a small number of markers.)

Measure the “inflation factor” over the actual association tests being done. Then afterward, use this on all chi-square results so obtained. (For whole-genome scans or a large number of markers.)

Both of these approaches are available in SVS, through the
**Genotype** > **Genotype Association Tests** dialog in the Analysis
menu.

From the Genotype Association Tests dialog, select **Show Inflation
Factor (Lambda), Chi-Squares, and Corrected Values** to find inflation
factors () and the results of applying the Genomic
Control technique on chi-squares, p-values, Bonferroni-adjusted
p-values, and False Discovery Rates.

Note

The inflation factor will be displayed in the Node Change Log for the Association Results spreadsheet.

From the Genotype Association Tests dialog, select **Correct Using This
Inflation Factor (Lambda) Instead:** and enter a value
to use as an “inflation factor” that was determined from a previous
association test run or from other data previously analyzed.

Note

The inflation factor relates to the chi-square statistic. After a chi-square statistic has been corrected through Genomic Control, the normal procedure for finding the approximate p-value is still followed. If there had been “inflation”, the Genomic Control-corrected p-value will be pushed closer to one.