# Mixed Linear Model Analysis with Interactions¶

Gene-Environment interactions may be analyzed using the **Mixed Linear
Model Analysis with Interactions** tool. The methods which are
available in this tool include:

- Linear regression with interactions
- Mixed Model GWAS using a single predictor (EMMAX) with interactions

These two methods are very similar to their non-interaction
counterparts (see *Mixed Linear Model Analysis*). The difference is
that a full-vs-reduced model is used which consists of the following for
the reduced model:

- The constant (“intercept”) term
- Any non-interacting covariates you may specify
- The interaction-term covariates you specify, not yet multiplied by the predictor
- The (single-SNP/single-locus) predictor

and for the full model, the above plus the following:

- The interaction terms. Each term is the element-by-element product of the corresponding interaction-term covariate with the predictor.

(These are in addition to the error term(s) of each model.)

For an explanation of the underlying mathematics, see
*Overview of Mixed Linear Models*, which includes the section
*Optimization when Gene-Environment Interaction Terms Are Included*.

## Performing Analysis¶

To perform mixed linear model analysis with interactions on genotypic
data or recoded genotypic data, open a spreadsheet and select a column
for the dependent variable. The dependent variable must be either
quantitative (real-valued or integer-valued) or a binary case/control
status column. To open the Mixed Linear Model Analysis with
Interactions window, select the **Genotype** > **Mixed Linear Model
Analysis with Interactions** menu item. This feature is currently
supported for spreadsheets with only one column set as dependent.
Categorical dependent columns are currently not supported.

The main tab of the Mixed Linear Model Analysis with Interactions
window (*Mixed Linear Model Analysis with Interactions Window (Main Tab)*) allows for various
methods and parameters to be set or changed. A list and brief
description of these options is as follows:

**Linear regression (fixed effects only) with Interactions:**This method performs a (fixed-effect-only) linear regression analysis with interactions. If the marker mapped data is genotypic, then the data is recoded first into the specified numeric model (additive, dominant or recessive). This linear regression analysis uses the same F test (Fast F Test) as the mixed linear models (EMMAX) and has a choice of imputation methods. See*Selecting the Interaction Terms*and*Correct for Additional Covariates*for instructions on including the interaction terms and other covariates in the model.**Single-locus mixed model GWAS (EMMAX) with Interactions:**[Kang2010] This method is an EMMAX implementation by [Vilhjalmsson2012] which analyzes both fixed effects containing interactions and random effects. See*Selecting the Interaction Terms*and*Correct for Additional Covariates*for instructions on including the interaction terms and other covariates in the model as fixed effects. A kinship matrix is always required to help describe the random effects. Either an identity-by-state (IBS) kinship matrix will be computed from the genotypic data or a pre-computed kinship matrix can be selected to speed up analysis. See*Precomputed Kinship Matrix Option*for more information about kinship matrices.

An additional parameter specific to the EMMAX with Interactions method is

**Use Pre-Computed Kinship Matrix (Cov. Matrix of Random Effects)**: See*Precomputed Kinship Matrix Option*.

Additional parameters for all regression models include:

**Genetic Model and Imputation:****Genetic model to use:**The genetic models available include:*Additive*: Recodes Major Homozygous genotype (dd) to 0, Heterozygous (Dd) to 1, and Minor Homozygous (DD) to 2.*Dominant*: Recodes Major Homozygous genotype (dd) to 0 and Heterozygous (Dd) and Minor Homozygous (DD) to 1.*Recessive*: Recodes Major Homozygous (dd) and Heterozygous (Dd) to 0 and Minor Homozygous (DD) to 1.

Note

If you are running this analysis from a numerically recoded spreadsheet, this prompt will read

**Genetic model used for recoding the original spreadsheet**, and you should enter which of the above recoding operations you have already performed (see*Recode Genotypes*) or would have performed to create the spreadsheet from which you are running this analysis.**Impute missing data as:**The options for imputing the missing genotypes include setting the missing genotype to*Homozygous major allele*: Always sets the missing genotypes to 0.*Numerically as average value*: Uses the average non-missing recoded genotype values for each marker as the value to use for recoding missing genotypes.Note

If

**Correct for Hemizygous Males**(see below) is also selected, and there is non-missing data for both males and females in a given marker, averages for males and females will be computed and applied separately.

**Correct for Hemizygous Males**: Recodes the X-Chromosome genotypes for males as 0 or 1. Assumes the column is coded as if the male were homozygous for the X-Chromosome allele in question.**Choose Sex Column:**Choose the spreadsheet column that specifies the gender of the sample. This column may either be categorical (“M” vs. “F”) or binary (0 = male, 1 = female).**Chromosome that is hemizygous for males:**Usually the X Chromosome, which is the default.

**Please Enter the Interaction Terms**: See*Selecting the Interaction Terms*.**Correct for Additional Covariates**: See*Correct for Additional Covariates*.

## Selecting the Interaction Terms¶

Interaction-term covariates can be selected for inclusion in the model from columns of this spreadsheet. As noted above, interaction-term covariates are included in the model in two different ways:

- Each interaction-term covariate that you select will act, by itself, as one reduced-model covariate.
- Each interaction-term covariate is also multiplied, element by element, with the predictor (the current marker/locus being analyzed) to create one of the actual interaction terms to be included in the full model.

Interaction-term covariates can be binary, integer, real-valued,
categorical or (if actual genotypic data rather than recoded genotypic
data is being used for the analysis) genotypic. In all cases, if a
marker is used as an interaction-term covariate, it will not be
included in the analysis in any other way. Click on **Add Columns** to
get a choice of spreadsheet columns to use.

## Correct for Additional Covariates¶

Additional fixed-effect reduced-model covariates (which are not a part
of any interaction term) can also be selected for inclusion in the
model from columns of this spreadsheet. These covariates can be
binary, integer, real-valued, categorical or (if actual genotypic data
rather than recoded genotypic data is being used for the analysis)
genotypic. In all cases, if a marker is used as an additional
covariate, it will not be included in the analysis in any other
way. To begin, check the **Correct for Additional Covariates** option,
then click on **Add Columns** to get a choice of spreadsheet columns
to use.

## Additional Outputs¶

The second tab of the Mixed Linear Model Analysis with Interactions
window is the same as the second tab of the Mixed Linear Model
Analysis window (see *Mixed Linear Model Analysis Window (Second Tab)*). This tab
allows for additional outputs to be added to either output
spreadsheet. Four of these selectable outputs are enhancements to the
p-value output:

**Bonferroni multiple testing correction****False discovery rate (FDR)****Output data for P-P/Q-Q plots****Output -log 10(P)**Negative log base 10 of the p-values. This is the optimal column to use for plotting the values in a genome browser.

The other four are genotype statistics, which are placed at the end of the spreadsheet.

**Call rate (fraction not missing)****Allele frequencies****Genotype counts****Allele Counts**

Note

- If you are using a numerically recoded spreadsheet and are not
using the additive model, only
**Call rate**is selectable as a genotype statistic. - Before the genotype statistics are output, the
*Actual Sample Size*, that is, the number of samples actually used for each marker, as distinct from samples containing missing data that is imputed, will always be output. - If you are using a genotypic spreadsheet, the
*Minor Allele (Test Allele)*and the*Major Allele*for each marker will always be output.

## Output from Linear Regression with Interactions¶

The linear regression option creates a **P-Values from Linear
Regression** spreadsheet.

The columns in this spreadsheet are:

*P-Value*This is the full-model-vs-reduced-model p-value.*-log10(P-Value)*The negative log (based 10) of the above p-value.For each of the following terms:

- the intercept (term 0)
- each additional covariate you have specified (if any)
- each interaction covariate
- the predictor itself, and
- each interaction term,

the following values are output from the full-model regression:

- The beta coefficient for this term (called
*Beta #*,*Beta pred*, or*Interaction # Beta*), and - The standard error associated with this coefficient. (The standard
error value is called
*Beta # SE*,*Beta pred SE*, or*Interaction # Beta SE*.)

Note

The standard error corresponding to at locus is computed as

where is the RSS value () for the full-model regression itself, is the number of samples, is the total number of terms in the full model (as explained above), and is the diagonal element of

where denotes a matrix whose columns are the full-model (fixed-effect) terms explained above, and denotes the generalized inverse of any matrix .

Any additional p-value outputs you have selected (see

*Additional Outputs*).The actual sample size and any genotype statistic outputs (see

*Additional Outputs*).

## Output from Single Locus Mixed Linear Model (EMMAX) with Interactions¶

If a kinship matrix was computed on the fly, this will be output
as **IBS Distance ((IBS2 * 0.5*IBS1)/ # non-missing markers)**. This
spreadsheet can be used again for further Mixed Model analysis.

Also created is the **P-Values from Single-Locus Mixed Model** spreadsheet.

The columns in this spreadsheet are:

*P-Value*This is the full-model-vs-reduced-model p-value.*-log10(P-Value)*The negative log (based 10) of the above p-value.For each of the following terms:

- the intercept (term 0)
- each additional covariate you have specified (if any)
- each interaction covariate
- the predictor itself, and
- each interaction term,

the following values are output from the full-model regression:

- The beta coefficient for this term (called
*Beta #*,*Beta pred*, or*Interaction # Beta*), and - The standard error associated with this coefficient. (The standard
error value is called
*Beta # SE*,*Beta pred SE*, or*Interaction # Beta SE*.)

Note

The standard error corresponding to at locus is computed from the full-model regression itself, which is, using the notation in

*Optimization when Gene-Environment Interaction Terms Are Included*,This standard error is

where is the RSS value from the regression (), is the number of samples, is the total number of terms in the full model (as explained above), and is the diagonal element of

where denotes the generalized inverse of any matrix A, and denotes the matrix whose columns are the full-model fixed-effect terms explained above, pre-multiplied by –that is,

Any additional p-value outputs you have selected (see

*Additional Outputs*).*Proportion of Variance Explained*: This column contains the proportion of variance explained by the effects of this marker. Using the notation of*Further Optimization When Covariates Are Present*, this proportion of variance explained will beThe actual sample size and any genotype statistic outputs (see

*Additional Outputs*).

The following will be output to the node change log of this spreadsheet:

The pseudo-heritability , which is

The variance and standard error of the pseudo-heritability and , see

*Estimating the Variance of Heritability*for the formula.The genetic component of variance

*Vg*().The error component of variance

*Ve*().Two final outputs also appear in the node change log, namely

*Proportion of genetic variance*, and*Prop. explained by fixed covariates*.

These are calculated as

and

where is the (actual) Root Sum of Squares (

*RSS*) value for the model consisting of the intercept and all covariates (including any additional covariates and including the interaction covariate(s), but no predictor and no interaction terms) as fixed effects, and is a reference RSS, which is the (actual) Root Sum of Squares computed from a model containing only the intercept as a fixed effect. The reference RSS may be written aswhere the results from using the EMMA technique to solve the null model equation