2.13.10. Predict Phenotypes From Existing Results¶
An alternative prediction procedure is to select Predict random effects for samples with missing phenotypes in the dialog of either Genotype -> Compute Genomic BLUP (GBLUP) (see Performing GBLUP Analysis) or Genotype -> Bayesian Genomic Prediction (see Bayes C and C-pi Genomic Prediction Analysis).
This feature will predict phenotypes using existing Allele Substitution Effects (ASE) and fixed effect coefficients and genotype and fixed effect information.
The initial spreadsheet, if in genotypic format, will be numerically recoded to ensure that the major/minor alleles are the same as that used in either GBLUP or Bayes, or in K-Fold.
This method uses (with a genotypic spreadsheet) or assumes (with a numerically recoded spreadsheet) an additive genetic model.
Computation Method: The following methods are available to represent genotype values:
As is: Genotype values will be coded as either 0, 1, 2 (additive model) (This is how Bayes C/C treats them)
Centered: Genotype values will be coded in the additive model, but the mean will then be subtracted from them. (This is how GBLUP treats them).
Impute Missing Genotypic Data As: Missing genotypic data can be imputed by either of the following methods:
For Centered data, these values for imputation are set before data centering takes place.
Homozygous major allele: All missing genotypic data will be recoded to 0.
Numerically as average value: All missing genotypic data will be recoded to the average of all non-missing genotype calls (as represented numerically).
If Correct for Gender (see below) is also selected, and there is non-missing data for both males and females in a given marker, averages for males and females will be computed and used separately.
Correct For Gender: Assumes the input column is coded as if the male were homozygous for the X-Chromosome allele in question.
If you select this option, and you have selected Centered data for your Computation Method, please select both a Dosage compensation value and a Normalization Algorithm. (For As is data, Equal X-linked genetic variance for males and females and Overall normalization will automatically be used.)
For the internal representations used for the X-Chromosome data, please see the beginning of Correcting the GRM for Gender Using Overall Normalization or of Correcting the GRM for Gender Using Normalization by Individual Marker for Centered data and the beginning of Gender Correction for As is data.
This option will only be available if there is a marker map and it contains at least one column in a chromosome that is listed in the assembly file as an allosome. The drop down list will only have chromosomes that are both allosomes and in this spreadsheet.
Transformed Data: If data had been standardized to perform calculations, done in Bayes C/C, then the mean and standard deviation can be entered here and the resultant predicted phenotypes will be transformed using these values. Please see Standardizing Phenotype Values.
Correct for Additional Covariates: Allows additional fixed effects to be added to this model from this spreadsheet. Fixed effect coefficients can be binary, integer, real-values, categorical, or genotypic. In all cases, if a marker is chosen as an additional fixed effect, it will not be included in the analysis in any other way. To begin, check this option, then clock on Add Columns to get a choice of spreadsheet columns to use.
Model Values: Select the spreadsheets containing the ASE and fixed effect coefficient values. The fixed effect spreadsheet should have, but doesn’t have to have, a “Reference Covariate?” column to ensure the reference factors in any categorical covariates match between the fixed effect spreadsheet and the spreadsheet the prediction is being performed on.
A single spreadsheet will be created:
Predicted Phenotypes: Contains the predicted phenotypes for all samples from the original spreadsheet for which there are valid covariates. If no additional covariates were used, this column contains the predicted phenotypes for all samples from the original spreadsheet.
Random effect component: Contains the estimated random effect component for all samples from the original spreadsheet.
We can estimate the random effect component as
where is the genotype matrix and are the ASE values.
We can then predict phenotype values with the following model:
where are the predicted phenotypes, is the fixed effects matrix, and are the fixed effect coefficients. Please see the Bayes Problem Statement and the GBLUP Problem Statement for more information on the model.