# Genomic Best Linear Unbiased Predictors Analysis Using Bins¶

## Performing Binned GBLUP Analysis¶

To better capture polygenic effects, the GBLUP method may be performed on markers that have been binned into categories, with a separate Genomic Relationship Matrix (GRM) being created (or imported) and used for every category (“bin”).

This method, as well as the feature *LD Score Computation and Binning*, is partly
inspired by the paper [Wainschtein2019], which describes recovering
the missing heritability for height and for body mass index (BMI) to
the level implied by pedigree studies.

Just as does the standard GBLUP method
(*Genomic Best Linear Unbiased Predictors Analysis*), binned GBLUP, after computing or
importing a set of genomic relationship matrices, computes the
“Genomic Best Linear Unbiased Predictor” (GBLUP) of additive genetic
merits by sample and of allele substitution effects (ASE) by
marker. [VanRaden2008], [Taylor2013]. One main difference between
the standard GBLUP and binned GBLUP is that a separate GBLUP of
additive genetic merits by sample is computed corresponding to every
GRM, with the total effect also being shown for every sample.

Note

Since any marker contributes to only one GRM, the ASE for that marker is based only on that one GRM and its variance component, and on no other GRMs.

Note

As with the standard GBLUP, this method uses (with a genotypic spreadsheet) or assumes (with a numerically recoded spreadsheet) an additive genetic model.

Note

The Average Information REML algorithm
(*Finding the Variance Components Using the Average Information (AI) Technique*) is always used to find the
variance components for binned GBLUP.

Note

No sub-models are computed which consist of some variance components/bins but not others. Only the full model and the “completely reduced” model (that has no random effects at all) are computed.

## Workflow¶

Normally, a binned GBLUP analysis consists of two steps:

Categorize your markers into bins (“bin the markers”). This may be done through:

- Using
**Genotype > Quality Assurance and Utilities > LD Score Computation and Binning**. (See :ref:``.) - Using
**File > Convert Genetic Marker Map into Spreadsheet**. This will have the effect of binning the markers by chromosome name. - Using
**DNA-Seq > Variant Binning by Frequency Source**. - Any other method that produces a spreadsheet with
- Row labels corresponding to your markers, and a
- binary, integer, or categorical column somewhere in the spreadsheet that designates a bin number or bin label corresponding to each marker.

- Using
Run this feature (

**Genotype > Compute GBLUP Using Bins**), which will use the binning spreadsheet as one of its inputs.Optionally, you could first use the feature

**Genotype > Quality Assurance and Utilities > GBLUP Genomic Relationship matrix**and precompute your set of GRMs using its*Create Multiple GRMs by Bin Using a Binning Spreadsheet*feature. You would then use this set of GRMs as input to this feature (**Genotype > Compute GBLUP Using Bins**).

## Options¶

**Bins/Categories of Markers**Choose the spreadsheet to be used for binning markers and the spreadsheet column containing the bin categories.**Impute missing genotypic data as:**Missing genotypic data can be imputed by either of the following methods:*Homozygous major allele*: All missing genotypic data will be recoded to 0.*Numerically as average value*: All missing genotypic data will be recoded to the average of all non-missing genotype calls (using the additive model).Note

If

**Correct for Gender**(see below) is also selected, and there is non-missing data for both males and females in a given marker, averages for males and females will be computed and used separately.

**Correct for Gender**: Assumes the column is coded as if the male were homozygous for the X-Chromosome allele in question. Uses the [Taylor2013] gender-correction algorithm. (See*Correcting the GRM for Gender Using Overall Normalization*and*Correcting the GRM for Gender Using Normalization by Individual Marker*.) Two values of the ASE are output, one for each gender.**Choose Sex Column**: Choose the spreadsheet column that specifies the gender of the sample. This column may either be categorical (“M” vs. “F”) or binary (0 = male, 1 = female).**Chromosome that is hemizygous for males**: Usually the X Chromosome, which is the default.**Dosage compensation**: Select the dosage compensation to be used. Equal X-Linked Variance is the default.

**Use Pre-Computed Genomic Relationship Matrices**: To use, check this option, then click on**Select Sheet**and, from the window that is presented, select the spreadsheet that lists the set of genomic relationship matrix spreadsheets. The list should have the categories/bins as row labels and the spreadsheet numbers in the first column. To be valid, the individual pre-computed GRM spreadsheets must each follow the rules outlined in*Precomputed Kinship Matrix Option*.**Correct for Additional Covariates**: Allows additional fixed effects to be added to this model from columns of this spreadsheet. Fixed effect covariates can be binary, integer, real-valued, categorical or (if actual genotypic data rather than recoded genotypic data is being used for the analysis) genotypic. In all cases, if a marker is used as an additional fixed effect, it will not be included in the analysis in any other way. To begin, check this option, then click on**Add Columns**to get a choice of spreadsheet columns to use.**Normalization Algorithm (Used or Assumed) for the GRM:**If pre-computed GRMs are not selected, this choice influences the GRM computations. In any case, this choice influences ASE computations. Overall normalization is the default. (See*The Genomic Relationship Matrix*.)**Missing Phenotypes**: To predict random effects (genomic merit/genomic breeding values) and the phenotypes for samples with missing phenotypes, select**Predict random effects for samples with missing phenotypes**(see*Genomic Prediction*). Selecting this will cause samples with missing phenotypes to be included in the ASE calculations.Otherwise, select

**Drop samples with missing phenotypes**.Note

An alternative prediction procedure is to use

**Genotype -> Predict Phenotypes From Existing Results**. See*Predict Phenotypes From Existing Results*.

## Binned GBLUP Output¶

### Spreadsheet Outputs¶

Unless you have selected **Use Pre-Computed Genomic Relationship
Matrix**, the following spreadsheets are output:

- One
GBLUP Genomic Relationship Matrixfor every category/bin. This is the relationship between pairs of samples, as determined by actual genomic similarity (or dis-similarity) between samples, over the markers contained in the bin.- A
Genomic Relationship Matrix Listspreadsheet, which, for each row, has the category/bin number or label as a row label and the spreadsheet number of the bin’s GRM in that row’s first column.

The following four spreadsheets will always be created:

GBLUP estimates by sample: This spreadsheet contains a column for the phenotype (selected dependent variable) of each sample, a column for theTotal random effect componentfor each sample, which is the sum of random effects from all individual random effect components for that sample, and, for each variance component/bin, a column containing theRandom effects componentrelated to that variance component/bin for each sample. If you have selectedPredict random effects for samples with missing phenotypes, another column containing thePredicted phenotypefor each sample will be inserted after the first column, which will then be called theActual Phenotype.

GBLUP fixed effect coefficientsContains the coefficient corresponding to each fixed effect. If there were no fixed effects selected, the only coefficient will be the intercept. For categorical covariates, the reference category will be listed with missing for the coefficient and a 1 in the the “Reference Covariate?” column. The “Reference Covariate?” column will contain a 0 for all non categorical and non reference covariates.

GBLUP estimates by marker: This spreadsheet first contains a column showing the bin/category for each marker. Next is a column for the GBLUP estimate of the allele substitution effect (ASE) for each marker (as computed from the random effects corresponding to that marker’s bin/category/GRM/variance component). The final column shows the absolute magnitudes of the ASE values. If gender correction is applied, separate columns for the ASE and the absolute magnitude of the ASE will be output for both males and females.The marker map is applied to this spreadsheet.

Sampling Var/Covar Matrix of the Variance Comp Estimates for the Full Model: These are the variances and covariances of the estimates of the variance components. A row and a column is created for each of the variance components (V(e)(),V(G_bin1)(), etc.). The diagonal contains the sampling variance of each variance component, while each off-diagonal element contains the covariance between the row’s variance component and the column’s variance component.

### Node Change Log Output¶

The following will be output to the node change log of each spreadsheet:

- The options used
- Summary statistics, including numbers of samples and markers scanned and analyzed

The following will additionally be output to the node change log of all spreadsheets other than (if they were generated) the GRM spreadsheets and the GRM list spreadsheet:

The number of markers processed for each bin.

The variances for the full model:

- First, the number of iterations required for convergence and the log(likelihood) for the full model are output.
- Then, for the full model, a table is output containing columns for
the description of (
*Source*), the value of (*Variance*), and the standard error (*SE*) of- The error variance component
*V(e)*() - The variance component
*V(G_bin)*for each bin *Vp*(), the sum of all variance components (including*V(e)*())- Each random-effect variance component divided by
*Sum of V(G)/Vp*, the sum of all random-effect variance components divided by .- The variance per marker for each bin.

- The error variance component

The variances for “the completely reduced model”–that is, the model containing only the error term .

The same output format is used as for the variances for the full model.

An overall random effects likelihood test. The following is output for this test:

*logL*The full-model likelihood*logL0*The likelihood of the null model*LRT*The likelihood ratio test ()*df*The degrees of freedom for this test*pval*The p-value for this test

Any messages from the algorithm concerning steps that had to be taken to make the algorithm converge.

## Precomputing the Set of Genomic Relationship Matrices¶

See *Separately Computing the Genomic Relationship Matrix* for information about pre-computing
the set of Genomic Relationship Matrices for a spreadsheet of markers
that have been binned.