Bayes C and C-pi Genomic Prediction Analysis

Performing Bayes C and C-pi Analysis

The Bayesian methods use the Bayes C and Bayes C\pi methods to compute additive genetic merits by sample and of allele substitution effects (ASE) by marker. [Habier2011].

Note

This method uses (with a genotypic spreadsheet) or assumes (with a numerically recoded spreadsheet) an additive genetic model.

Bayesian Genomic Prediction Dialog Window

Bayesian Genomic Prediction Dialog Window

During the Estimating Parameters stage, a log will update once every hundred iterations with the current number of markers included.

Bayesian Genomic Prediction Run Log

Bayesian Genomic Prediction Run Log

Options

  • Bayesian Method: There are currently two Bayesian methods for genomic prediction:

    • Bayes C-pi: The \pi variable is unknown and estimated.
    • Bayes C: The \pi variable is treated as known and kept fixed at the value set in Set initial value of pi.
    • Set initial value of pi: This will be either the initial value of \pi for Bayes C\pi or the fixed value of \pi for Bayes C.
  • Impute Missing Genotypic Data as: Missing genotypic data can be imputed by either of the following methods:

    • Homozygous major allele: All missing genotypic data will be recoded to 0.

    • Numerically as average value: All missing genotypic data will be recoded to the average of all non-missing genotype calls (using the additive model).

      Note

      If Correct for Gender (see below) is also selected, and there is non-missing data for both males and females in a given marker, averages for males and females will be computed and used separately.

  • Computation Method: Genotypic data can be represented in two ways:

    • As Is: genotype values will be represented as either a 0, 1, or 2.
    • Centered: genotype values will first be coded as either 0, 1, or 2, then each marker is subtracted by it’s mean.
  • Correct For Gender: Assumes the column coded as if the male were homozygous for the X-Chromosome allele in question. Uses a modified version of the [Taylor2013] gender-correction algorithm (see Correcting for Gender) (see Gender Correction). Two values of the ASE are outputted, one for each gender.

    Note

    This option will only be available if there is a marker map and it contains at least one column in a chromosome that is listed in the assembly file as an allosome. The drop down list will only have chromosomes that are both allosomes and in this spreadsheet.

  • Use Pre-Computed Genomic Relationship Matrix: To use, check this option, the click on Select Sheet and select the genomic relationship matrix spreadsheet from the window that is presented. To be valid, this spreadsheet must follow the rules outlined in Precomputed Kinship Matrix Option. Precomputed Kinship Matrix Option.

    Note

    The HWE variance sum \phi is re-calculated from the genotypic data being used for this analysis.

  • Correct for Additional Covariates: Allows additional fixed effects to be added to this model from columns of this spreadsheet. Fixed effect covariates can be binary, integer, real-values, categorical or (if actual genotypic data rather than recoded genotypic data is being used for the analysis) genotypic. In all cases, if a marker is used as an additional fixed effect, it will not be included in the analysis in any other way. To begin, check this option, then click on Add Columns to get a choice of spreadsheet columns to use.

  • Missing Phenotypes: To predict random effects (genomic merit/genomic breeding values) and the phenotypes for samples with missing phenotypes, select Predict random effects for samples with missing phenotypes. Otherwise, select Drop samples with missing phenotypes.

  • Chain Parameters: These are the options that control the size and the chains and their characteristics.

    • Number of Iterations: Enter the number of iterations the MCMC loop should run for.
    • Burn-in: How many samples should be thrown out from the beginning of the chain. Set this to 0 for no burn-in.
    • Thinning: Only store one sample every x iterations, where x is the number entered in the box. Set this to 0 for no thinning.

Output

The following spreadsheets, logs, and plots will always be created:

  • Bayes C/C Pi estimates by marker: These are the estimates of the allele substitution effect (ASE) by marker, along with the absolute magnitude of the ASE and the normalized absolute magnitude of the ASE. If gender correction is applied, separate columns for the ASE, the absolute magnitude of the ASE, and the normalized Abs ASE will be output for both males and females. If there was a marker map on the original spreadsheet, it will be applied to this one.
  • Bayes C/C Pi estimates by sample: This spreadsheet contains the phenotype (selected dependent variable) of each sample and the Random effects components for each sample. The standardized phenotype values will also be shown. (See Standardizing Phenotype Values)
  • Bayes C/C Pi Run Log: A result viewer with the output from the run log.
  • Bayes C/C Pi Trace Spreadsheet: This will display the sampled values for \pi, \sigma_M^2, \sigma_e^2, and the number of markers included for each iteration, for males and females. This spreadsheet can be used to create autocorrelation plots, please see Autocorrelation Plots.
  • Bayes C/C Pi Trace Plots: Plots of each column that appears in the Trace Spreadsheet. These can be used to determine if the variable is exploring the sampling space and/or if it has stabilized. This can help determine if and how much burn-in could be used.

The following values will be output to the node change log of each spreadsheet:

  • Phenotype has been standardized: The mean and standard deviation of the original phenotype data used to calculate the z-scores.

  • Pseudo-heritability: This is defined as

    ph = \hat {\sigma_g^2} / (\hat {\sigma_g^2} + \hat {\sigma_e^2})

  • Would-be pseudo-heritability: If the normalized genomic relationship matrix had been used. This is

    phw = 1 / (1 + \hat \delta / w),

    where w is the normalizing factor that would have been necessary to normalize the genomic relationship matrix according to the methodology of Normalizing the Kinship Matrix.

  • Var Error: The error component of variance, \hat \sigma_e^2.

  • Var Effects: The component of variance for the ASE, \hat \sigma_M^2.

  • Var Genomic: The genetic component of variance, \hat \sigma_g^2:

    \sigma_g^2 = \phi \sigma_M^2 G

  • Pi: The average \pi value sampled over all iterations.

  • Proportion of genetic variance

  • Proportion explained by fixed covariates

    See The Variance Partition Plot.

Note

If gender correction was selected, there will be two (male and female) versions of pseudo-heritability, would-be pseudo-heritability, var effects, var genomics, pi, and proportion of genetic variance.

Genomic Relationship Matrix

Unless the Use Pre-Computed Genomic Relationship Matrix option is selected, a Bayes C/C Pi Genomic Relationship Matrix spreadsheet will be created. This spreadsheet can be used for subsequent Bayesian runs or it can be used with GBLUP, and the EMMAX and MLMM Mixed Model GWAS methods. This kinship matrix is the same as the one produced by GBLUP.