Genetic Correlation of Two Traits using GBLUP

Performing GBLUP Analysis

The Genetic Correlation of Two Traits using the GBLUP method performs a bivariate REML analysis on two selected traits to estimate the genetic variance of each trait and the genetic covariance between two traits that can be captured by all SNPs.

Note

This method uses (with a genotypic spreadsheet) or assumes (with a numerically recoded spreadsheet) an additive genetic model.

See Genetic Correlation Using GBLUP for more information.

Options

gblupDialog

Genetic Correlation Dialog Window

  • Computations: The default computation for genetic correlation includes the residual covariance term \sigma^2_{e12} E12 as a variance component.

    • Exclude the residual covariance from computations: Checking this box excludes the residual covariance term \sigma^2_{e12}
E12 from the computations. This may help convergence in certain cases. (See Genetic Correlation Using GBLUP.)
  • Impute missing genotypic data as: Missing genotypic data can be imputed by either of the following methods:

    • Homozygous major allele: All missing genotypic data will be recoded to 0.

    • Numerically as average value: All missing genotypic data will be recoded to the average of all non-missing genotype calls (using the additive model).

      Note

      If Correct for Gender (see below) is also selected, and there is non-missing data for both males and females in a given marker, averages for males and females will be computed and used separately.

  • Correct for Gender: Assumes the column is coded as if the male were homozygous for the X-Chromosome allele in question. Uses the [Taylor2013] gender-correction algorithm. (See Correcting the GRM for Gender Using Overall Normalization and Correcting the GRM for Gender Using Normalization by Individual Marker.) Two values of the ASE are output, one for each gender.

    • Choose Sex Column: Choose the spreadsheet column that specifies the gender of the sample. This column may either be categorical (“M” vs. “F”) or binary (0 = male, 1 = female).
    • Chromosome that is hemizygous for males: Usually the X Chromosome, which is the default.
  • Use Pre-Computed Genomic Relationship Matrix: To use, check this option, then click on Select Sheet and select the genomic relationship matrix spreadsheet from the window that is presented. To be valid, this spreadsheet must follow the rules outlined in Precomputed Kinship Matrix Option.

    Note

    When using a pre-computed genomic relationship matrix, the matrix M and the scaling factor \phi are re-calculated from the genotypic data being used for this analysis.

  • Correct for Additional Covariates: Allows additional fixed effects to be added to this model from columns of this spreadsheet. Fixed effect covariates can be binary, integer, real-valued, categorical or (if actual genotypic data rather than recoded genotypic data is being used for the analysis) genotypic. In all cases, if a marker is used as an additional fixed effect, it will not be included in the analysis in any other way. To begin, check this option, then click on Add Columns to get a choice of spreadsheet columns to use.

GBLUP Correlation Output

The Variance Components

As documented in Genetic Correlation Using GBLUP, genetic correlation always involves the following variance components:

  • V(G1) (\sigma^2_{u1}) Variance of y_1 due to random effects
  • V(G2) (\sigma^2_{u2}) Variance of y_2 due to random effects
  • C(G12) (\sigma^2_{u1u2}) Covariance between y_1 and y_2 due to random effects
  • V(E1) (\sigma^2_{e1}) Error variance for y_1
  • V(E2) (\sigma^2_{e2}) Error variance for y_2

and, if Exclude the residual covariance from computations is unchecked,

  • C(E12) (\sigma^2_{e12}) (Residual) covariance of y_1 and y_2 due to error terms.

Spreadsheet Output

The following two spreadsheets will always be created:

  • GBLUP estimates by sample: This spreadsheet, which contains a row for each sample, contains the following:

    • A column for each of the two phenotypes

    • For each phenotype i and for each variance component \sigma^2_{vc}, two columns, one containing the Gamma vector \hat{\gamma}_{vci} related to phenotype i and random effect (or correlation term or error term) \sigma^2_{vc}, and the other containing the Random effect \hat{u}_{vci} related to phenotype i and random effect (or correlation term or error term) \sigma^2_{vc}.

      The Gamma and Random effect for phenotype 1 and for phenotype 2 for variance component \sigma^2_{vc} are defined by

      \begin{bmatrix} \hat{\gamma}_{vc1} \\ \hat{\gamma}_{vc2} \end{bmatrix} =
\sigma^2_{vc} P \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}

      and

      \begin{bmatrix} \hat{u}_{vc1} \\ \hat{u}_{vc2} \end{bmatrix} =
G_{vc} \sigma^2_{vc} P \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} =
G_{vc} \begin{bmatrix} \hat{\gamma}_{vc1} \\ \hat{\gamma}_{vc2} \end{bmatrix} ,

      where matrix P is defined in Finding the Variance Components Using the Average Information (AI) Technique, and G_{vc} can mean any of G1, G2, G12, E1, E2, or E12. (See Genetic Correlation Using GBLUP.)

    For example, suppose that Phenotype 1 is called PhenA and that Phenotype 2 is called PhenB. The columns that this spreadsheet will display will then be:

    • Phenotype 1: PhenA PhenA’s values
    • Phenotype 2: PhenB PhenB’s values
    • Gamma for PhenA from V(G1) The Gamma value for PhenA for variance component V(G1)
    • Random effect for PhenA from V(G1) The PhenA random effect from variance component V(G1)
    • Gamma for PhenB from V(G1) The Gamma value for PhenB for variance component V(G1)
    • Random effect for PhenB from V(G1) The PhenB random effect from variance component V(G1)
    • Gamma for PhenA from V(G2) The Gamma value for PhenA for variance component V(G2)
    • Random effect for PhenA from V(G2) Etc....
    • Gamma for PhenB from V(G2)
    • Random effect for PhenB from V(G2)
    • Gamma for PhenA from C(G12)
    • Random effect for PhenA from C(G12)
    • Gamma for PhenB from C(G12)
    • Random effect for PhenB from C(G12)
    • Gamma for PhenA from V(E1)
    • Random effect for PhenA from V(E1)
    • Gamma for PhenB from V(E1)
    • Random effect for PhenB from V(E1)
    • Gamma for PhenA from V(E2)
    • Random effect for PhenA from V(E2)
    • Gamma for PhenB from V(E2)
    • Random effect for PhenB from V(E2)

    and, if Exclude the residual covariance from computations is unchecked,

    • Gamma for PhenA from C(E12)
    • Random effect for PhenA from C(E12)
    • Gamma for PhenB from C(E12)
    • Random effect for PhenB from C(E12)
  • GBLUP estimates by marker: This spreadsheet, which contains a row for every marker, contains the following columns for each phenotype i and for each variance component \sigma^2_{vc}:

    • A column for the GBLUP estimates of the allele substitution effect (ASE) by marker \hat{\alpha}_{vci} =
M'\hat{\gamma}_{vci}/\phi relating to phenotype i and random effect (or correlation term or error term) \sigma^2_{vc}.

      Note

      If Normalize by Individual Marker has been used, the actual ASE \hat{\alpha}_{vci} is found by taking the result as calculated above and dividing its k-th element by the factor \sqrt{2 p_k q_k}, where p_k and q_k are the major and minor allele frequencies for marker k, respectively.

      If gender correction is applied, separate columns for the ASE will be output for both males and females for each phenotype i and for each random effect (or correlation term or error term) \sigma^2_{vc}.

    For example, suppose that Phenotype 1 is called PhenA and that Phenotype 2 is called PhenB. Further suppose that no gender correction is applied. The columns that this spreadsheet will display will then be:

    • Allele substitution effect (ASE) V(G1) (PhenA) The allele substitution effect for each marker relating to phenotype PhenA and random effect V(G1).
    • ASE V(G1) (PhenB) The allele substitution effect for each marker relating to phenotype PhenB and random effect V(G1).
    • ASE V(G2) (PhenA) The allele substitution effect for each marker relating to phenotype PhenA and random effect V(G2).
    • ASE V(G2) (PhenB) Etc....
    • ASE C(G12) (PhenA)
    • ASE C(G12) (PhenB)
    • ASE V(E1) (PhenA)
    • ASE V(E1) (PhenB)
    • ASE V(E2) (PhenA)
    • ASE V(E2) (PhenB)

    and, if Exclude the residual covariance from computations is unchecked,

    • ASE C(E12) (PhenA)
    • ASE C(E12) (PhenB)

    The marker map will be applied to this spreadsheet.

Node Change Log Output

The following will be output to the node change log of each spreadsheet (in addition to the options used and the summary statistics of numbers of records processed, etc.):

  • The variances for the full model. (This is the only model analyzed by the Genetic Correlation feature.)

    First, the number of iterations required for convergence and the log(likelihood) for this model are output.

    Then, a table is output containing columns for the description of (Source), the value of (Variance), and the standard error (SE) of

    • Each variance component, including error components and covariance components
    • Vp1 (V_{p1}), the sum of the variance components for the first phenotype y_1. These components are V(G1) (\sigma^2_{u1}) and V(E1) (\sigma^2_{e1}).
    • Vp2 (V_{p2}), the sum of the variance components for the second phenotype y_2. These components are V(G2) (\sigma^2_{u2}) and V(E2) (\sigma^2_{e2}).
    • Random-effect variance component V(G1) (\sigma^2_{u1}) divided by V_{p1}
    • Random-effect variance component V(G2) (\sigma^2_{u2}) divided by V_{p2}
  • The Genetic Correlation rG (and its standard error).

    The Genetic Correlation is defined as

    r_G = \frac{\sigma^2_{u1u2}}{\sqrt{\sigma^2_{u1} \sigma^2_{u2}}}

    or, using the alternative notation output by SVS,

    rG = \frac{C(G12)}{\sqrt{V(G1) V(G2)}}

    This value, and its standard error, are placed as the last line at the end of the above table of variances.