# Mixed Linear Model Analysis with Interactions¶

Gene-Environment interactions may be analyzed using the **Mixed Linear
Model Analysis with Interactions** tool. The methods which are
available in this tool include:

- Linear regression with interactions
- Mixed Model GWAS using a single predictor (EMMAX) with interactions

These two methods are very similar to their non-interaction
counterparts (see *Mixed Linear Model Analysis*). The difference is
that a full-vs-reduced model is used which consists of the following for
the reduced model:

- The constant (“intercept”) term
- Any non-interacting covariates you may specify
- The interaction-term covariates you specify, not yet multiplied by the predictor
- The (single-SNP/single-locus) predictor

and for the full model, the above plus the following:

- The interaction terms. Each term is the element-by-element product of the corresponding interaction-term covariate with the predictor.

(These are in addition to the error term(s) of each model.)

For an explanation of the underlying mathematics, see
*The Mixed Model Equation for Gene-by-Environment Interactions*. Also see the mathematical notes in
*Output from the Linear Regression (fixed effects only) with Interactions* and *Output from Single Locus Mixed Linear Model (EMMAX) with Interactions* below.

Note

If any interaction, once formulated from the interaction-term covariate and the predictor, is collinear with the remaining regression covariates including any other interactions, that interaction will not be used in the analysis.

An example of such an interaction is when all the genotypes of a marker are the same for one category of a categorical variable which has been selected as an interaction variable. Such an interaction will either come out as all zeros or be collinear with the interaction term from which it was formulated.

## Data Types Accepted¶

This tool can perform the regression analysis directly on mapped
genotypic data, on mapped recoded genotypic data, or on mapped real-valued
data. The details noted in *Data Types Accepted* apply here.

## Performing Analysis¶

To perform mixed linear model analysis with interactions on genotypic
data, recoded genotypic data, or real data, open a spreadsheet and
select a column for the dependent variable. The dependent variable
must be either quantitative (real-valued or integer-valued) or a binary
case/control status column. To open the Mixed Linear Model Analysis with
Interactions window, select the **Genotype** > **Mixed Linear Model
Analysis with Interactions** menu item. This feature is currently
supported for spreadsheets with only one column set as dependent.
Categorical dependent columns are currently not supported.

The main tab of the Mixed Linear Model Analysis with Interactions
window (*Mixed Linear Model Analysis with Interactions Window (Main Tab)*) allows for various
methods and parameters to be set or changed. A list and brief
description of these options is as follows:

**Linear regression (fixed effects only) with Interactions:**This method performs a (fixed-effect-only) linear regression analysis with interactions. If the marker mapped data is genotypic, then the data is recoded first into the specified numeric model (additive, dominant or recessive). This linear regression analysis uses the same F test (Fast F Test) as the mixed linear models (EMMAX) and has a choice of imputation methods. See*Selecting the Interaction Terms*and*Correct for Additional Covariates*for instructions on including the interaction terms and other covariates in the model.**Single-locus mixed model GWAS (EMMAX) with Interactions:**[Kang2010] This method is an EMMAX implementation by [Vilhjalmsson2012] which analyzes both fixed effects containing interactions and random effects. See*Selecting the Interaction Terms*and*Correct for Additional Covariates*for instructions on including the interaction terms and other covariates in the model as fixed effects. A kinship matrix is always required to help describe the random effects. Either an identity-by-state (IBS) kinship matrix will be computed from the genotypic data or a pre-computed kinship matrix can be selected to speed up analysis. See*Precomputed Kinship Matrix Option*for more information about kinship matrices.

An additional parameter specific to the EMMAX with Interactions method is

**Use Pre-Computed Kinship Matrix (Cov. Matrix of Random Effects)**: See*Precomputed Kinship Matrix Option*.

Additional parameters for all regression models include:

**Genetic Model and Imputation:****Genetic model to use:**The genetic models available include:*Additive*: Recodes Major Homozygous genotype (dd) to 0, Heterozygous (Dd) to 1, and Minor Homozygous (DD) to 2.*Dominant*: Recodes Major Homozygous genotype (dd) to 0 and Heterozygous (Dd) and Minor Homozygous (DD) to 1.*Recessive*: Recodes Major Homozygous (dd) and Heterozygous (Dd) to 0 and Minor Homozygous (DD) to 1.

Note

If you are running this analysis from a numerically recoded spreadsheet, this prompt will read

**Genetic model used for recoding the original spreadsheet**, and you should enter which of the above recoding operations you have already performed (see*Recode Genotypes*) or would have performed to create the spreadsheet from which you are running this analysis.**Impute missing data as:**The options for imputing the missing genotypes include setting the missing genotype to*Homozygous major allele*: Always sets the missing genotypes to 0.*Numerically as average value*: Uses the average non-missing recoded genotype values for each marker as the value to use for recoding missing genotypes.Note

If

**Correct for Hemizygous Males**(see below) is also selected, and there is non-missing data for both males and females in a given marker, averages for males and females will be computed and applied separately.

**Correct for Hemizygous Males**: Recodes the X-Chromosome genotypes for males as 0 or 1. Assumes the column is coded as if the male were homozygous for the X-Chromosome allele in question.**Choose Sex Column:**Choose the spreadsheet column that specifies the gender of the sample. This column may either be categorical (“M” vs. “F”) or binary (0 = male, 1 = female).**Chromosome that is hemizygous for males:**Usually the X Chromosome, which is the default.

**Please Enter the Interaction Terms**: See*Selecting the Interaction Terms*.**Correct for Additional Covariates**: See*Correct for Additional Covariates*.

## Selecting the Interaction Terms¶

Interaction-term covariates can be selected for inclusion in the model from columns of this spreadsheet. As noted above, interaction-term covariates are included in the model in two different ways:

- Each interaction-term covariate that you select will act, by itself, as one reduced-model covariate.
- Each interaction-term covariate is also multiplied, element by element, with the predictor (the current marker/locus being analyzed) to create one of the actual interaction terms to be included in the full model (assuming it is not collinear with the remaining terms of the model).

Interaction-term covariates can be binary, integer, real-valued,
categorical or (if actual genotypic data rather than recoded genotypic
data is being used for the analysis) genotypic. In all cases, if a
marker is used as an interaction-term covariate, it will not be
included in the analysis in any other way. Click on **Add Columns** to
get a choice of spreadsheet columns to use.

## Correct for Additional Covariates¶

Additional fixed-effect reduced-model covariates (which are not a part
of any interaction term) can also be selected for inclusion in the
model from columns of this spreadsheet. These covariates can be
binary, integer, real-valued, categorical or (if actual genotypic data
rather than recoded genotypic data is being used for the analysis)
genotypic. In all cases, if a marker is used as an additional
covariate, it will not be included in the analysis in any other
way. To begin, check the **Correct for Additional Covariates** option,
then click on **Add Columns** to get a choice of spreadsheet columns
to use.

## Additional Outputs¶

The second tab of the Mixed Linear Model Analysis with Interactions
window is the same as the second tab of the Mixed Linear Model
Analysis window (see *Mixed Linear Model Analysis Window (Second Tab)*). This tab
allows for additional outputs to be added to either output
spreadsheet. Four of these selectable outputs are enhancements to the
p-value output:

**Bonferroni multiple testing correction****False discovery rate (FDR)****Output data for P-P/Q-Q plots****Output -log 10(P)**Negative log base 10 of the p-values. This is the optimal column to use for plotting the values in a genome browser.

The other four are genotype statistics, which are placed at the end of the spreadsheet.

**Call rate (fraction not missing)****Allele frequencies****Genotype counts****Allele Counts**

Note

- If you are using a numerically recoded spreadsheet and are not
using the additive model, only
**Call rate**is selectable as a genotype statistic. - Before the genotype statistics are output, the
*Actual Sample Size*, that is, the number of samples actually used for each marker, as distinct from samples containing missing data that is imputed, will always be output. - If you are using a genotypic spreadsheet, the
*Minor Allele (Test Allele)*and the*Major Allele*for each marker will always be output.

## Output from the Linear Regression (fixed effects only) with Interactions¶

The linear regression option creates a **P-Values from Linear
Regression** spreadsheet.

The columns in this spreadsheet are:

*P-Value*P-value of the linear regression.*-log10(P-Value)*The negative log (based 10) of the above p-value.For each of the following terms:

- the intercept (term 0)
- each additional covariate you have specified (if any)
- each interaction covariate
- the predictor itself, and
- each interaction term,

the beta coefficient for this term and the standard error associated with this coefficient are output from the full-model regression.

Note

If, for any given interaction term and marker, the interaction is collinear with the other regression covariates and is thus not used in the analysis, missing values will be output for the Beta and Beta SE for that marker and interaction.

As an example, if you have specified one additional numeric covariate called CovA and two numeric interaction covariates called IntA and IntB, your outputs in this part of the output spreadsheet will be

*Intercept Beta**Intercept Beta SE**CovA Beta**CovA Beta SE**IntA Beta*(the beta for IntA, taken by itself)*IntA Beta SE**IntB Beta*(the beta for IntB, taken by itself)*IntB Beta SE**Predictor Beta*(the beta for the predictor, taken by itself)*Predictor Beta SE**IntA Interaction Beta*(the beta for the interaction between IntA and the predictor)*IntA Interaction Beta SE**IntB Interaction Beta*(the beta for the interaction between IntB and the predictor)*IntB Interaction Beta SE*

Note

In the above example, if IntB’s interaction with the predictor of, for instance,

*Marker6*, is zero or is collinear with IntB itself, missing values will be output for*IntB Interaction Beta*and*IntB Interaction Beta SE*for*Marker6*.Note

The standard error corresponding to at locus is computed from the full-model regression itself, which is, using the notation in the (completely fixed-effect) linear-model note to

*Optimization when Gene-Environment Interaction Terms Are Included*,This standard error is

where is the RSS value () for the full-model regression itself, is the number of samples, is the total number of terms in the full model actually used for the -th marker (“actually used” rather than rejected for being collinear, as explained in the first note for

*Mixed Linear Model Analysis with Interactions*above), and is the diagonal element ofthe “full model squared inverse” at the -th marker, where denotes the generalized inverse of any matrix A, and denotes the matrix whose columns are the full-model fixed-effect terms explained above. That is,

Any additional p-value outputs you have selected (see

*Additional Outputs*).The actual sample size and any genotype statistic outputs (see

*Additional Outputs*).

## Output from Single Locus Mixed Linear Model (EMMAX) with Interactions¶

If a kinship matrix was computed on the fly, this will be output
as **IBS Distance ((IBS2 * 0.5*IBS1)/ # non-missing markers)**. This
spreadsheet can be used again for further Mixed Model analysis.

Also created is the **P-Values from Single-Locus Mixed Model** spreadsheet.

The columns in this spreadsheet are:

*P-Value*This is the full-model-vs-reduced-model p-value.*-log10(P-Value)*The negative log (based 10) of the above p-value.For each of the following terms:

- the intercept (term 0)
- each additional covariate you have specified (if any)
- each interaction covariate
- the predictor itself, and
- each interaction term,

the beta coefficient for this term and the standard error associated with this coefficient are output from the full-model regression.

Note

If, for any given interaction term and marker, the interaction is collinear with the other regression covariates and is thus not used in the analysis, missing values will be output for the Beta and Beta SE for that marker and interaction.

As an example, if you have specified one additional numeric covariate called CovA and two numeric interaction covariates called IntA and IntB, your outputs in this part of the output spreadsheet will be

*Intercept Beta**Intercept Beta SE**CovA Beta**CovA Beta SE**IntA Beta*(the beta for IntA, taken by itself)*IntA Beta SE**IntB Beta*(the beta for IntB, taken by itself)*IntB Beta SE**Predictor Beta*(the beta for the predictor, taken by itself)*Predictor Beta SE**IntA Interaction Beta*(the beta for the interaction between IntA and the predictor)*IntA Interaction Beta SE**IntB Interaction Beta*(the beta for the interaction between IntB and the predictor)*IntB Interaction Beta SE*

Note

In the above example, if IntB’s interaction with the predictor of, for instance,

*Marker6*, is zero or is collinear with IntB itself, missing values will be output for*IntB Interaction Beta*and*IntB Interaction Beta SE*for*Marker6*.Note

The standard error corresponding to at locus is computed from the full-model regression itself, which is, using the notation in

*The Mixed Model Equation for Gene-by-Environment Interactions*,This standard error is

where is the RSS value () for the full-model regression itself, is the number of samples, is the total number of terms in the full model actually used for the -th marker (“actually used” rather than rejected for being collinear, as explained in the first note for

*Mixed Linear Model Analysis with Interactions*above), and is the diagonal element ofthe “full model squared inverse” at the -th marker, where denotes the generalized inverse of any matrix A, and denotes the matrix whose columns are the full-model fixed-effect terms explained above, pre-multiplied by . That is,

Any additional p-value outputs you have selected (see

*Additional Outputs*).*Proportion of Variance Explained*: This column contains the proportion of variance explained by the effects of this marker. Using the notation of*Optimization when Gene-Environment Interaction Terms Are Included*, this proportion of variance explained will beThe actual sample size and any genotype statistic outputs (see

*Additional Outputs*).Any

*Comments*about the regression for this marker, including a note about any interaction that had to be dropped.

The following will be output to the node change log of this spreadsheet:

The scaling factor used for the kinship matrix, as computed according to

*Scaling the Kinship Matrix*.The pseudo-heritability , which is

The variance and standard error of the pseudo-heritability and , see

*Estimating the Variance of Heritability for One Random Effect*for the formula.The genetic component of variance

*Vg*().The error component of variance

*Ve*().Two final outputs also appear in the node change log, namely

*Proportion of genetic variance*, and*Prop. explained by fixed covariates*.

These are calculated as

and

where is the (actual) Root Sum of Squares (

*RSS*) value for the model consisting of the intercept and all covariates (including any additional covariates and including the interaction covariate(s), but no predictor and no interaction terms) as fixed effects, and is a reference RSS, which is the (actual) Root Sum of Squares computed from a model containing only the intercept as a fixed effect. The reference RSS may be written aswhere the results from using the EMMA technique to solve the null model equation