2.13.2. Large Kinship Matrices or Large Numbers of Samples

The Problem

If there are more than 8,000 samples, memory requirements for computing mixed models begin to be substantial, or even overbearing. Additionally, with this many samples, computation of eigenvalues and eigenvectors begins to take much longer for methods such as GBLUP using the EMMA algorithm (see Performing GBLUP Analysis) or for Principal Components Analysis (PCA) (see Genotypic Principal Component Analysis and Numeric Principal Component Analysis).

The Solution

Thus, for GBLUP using the EMMA algorithm, for K-Fold cross validation using GBLUP, and for Principal Components Analysis (PCA) (Step 2: Find the Principal Components), SVS offers a large-data mode for some of the computations and intermediate data storage. This mode consists of:

  • Storing intermediate matrix data in scratch spreadsheets. The data for one matrix may be stored either in one spreadsheet or, on the other hand, in many separate spreadsheets each containing only a small enough portion of the data to allow it to be held conveniently in (random-access) memory.

  • Computing basic matrix operations piece by piece, reading sections of the data from the spreadsheet (or spreadsheets) as necessary, making computations using these sections of data and, if the computational outputs consist of a new large matrix, writing the computation results section by section to a new large spreadsheet or small separate spreadsheets.

  • For large-data GBLUP using EMMA and K-Fold cross-validation using GBLUP: In order to find a matrix B (Solving the Mixed Model Equation Using EMMA) based on the matrix H =
K + \delta I, \delta itself is first determined, then a Cholesky decomposition of matrix H is performed using large-data matrix techniques. This avoids the need to determine the eigenvalues and eigenvectors of the kinship matrix K itself.

  • To find the eigenvalues and the corresponding eigenvectors of the large matrix that must be dealt with, which is:

    • For GBLUP using EMMA and K-Fold cross-validation using GBLUP: Matrix S(K+I)S

    • For large-data Principal Components Analysys (PCA): the Wishart Matrix

    SVS uses an adaptation of the methodology of [Halko2010] (see Decomposition of Large Matrices Using Random Numbers below), along with the large-data techniques mentioned above.

How to Use

  • For GBLUP using the EMMA algorithm and for K-Fold cross validation using GBLUP: If there are more than 8,000 samples, an extra dialog will pop up, asking for what “mode” you wish to use (see the table shown in For GBLUP using EMMA: Large N Performance Tradeoffs), and whether you wish to use large-data techniques at all.

  • For Principal Components Analysis (PCA): There is a separate dialog and utility available for finding a few principal components of a dataset with a large number of samples. (This dialog and utility is available even if you have fewer than 8,000 samples.) See Finding Principal Components When the Sample Size is Large.

Note

Large-data mode is not currently available for GBLUP using the Average Information (AI) REML algorithm.

Decomposition of Large Matrices Using Random Numbers

The authors of [Halko2010] describe a technique using random numbers for finding an orthonormal matrix Q whose range approximates the range of another matrix A of a much larger size. This matrix Q may then be used to (for instance) approximate the largest eigenvalues and their corresponding eigenvectors for a symmetric matrix A. This technique only needs one pass to cause the overall error of expressing matrix A using its approximate largest eigenvalues and eigenvectors to be of the same magnitude as the largest of the next-smaller eigenvalues which were not approximated. [Halko2010] also describes a method of increasing accuracy involving post-multiplying A by Q when the eigenvalues do not descend in value very quickly, and/or large data needs to be dealt with.

Golden Helix SVS adapts this technique using two layers of iteration to approximate all requested eigenvalues and eigenvectors of such a large-size matrix A. The top layer of iteration finds batches of the largest eigenvalues of first A, then of A - UDU', where UDU' is the eigenvalue/eigenvector decomposition formed from the largest eigenvalues of A, then of A - UDU' -
U_2D_2U_2', where U_2D_2U_2' is the eigenvalue/eigenvector decomposition formed from the largest eigenvalues of A - UDU' (which are the next-largest set of eigenvalues of A), and so on. The second layer uses the [Halko2010] technique to obtain estimates of the largest eigenvalues and eigenvectors of A (or of A - UDU', etc.).

  • For GBLUP using EMMA: All eigenvalues and all eigenvectors are estimated.

  • For Principal Components Analysis (PCA): Only the requested number of eigenvalues and eigenvectors (components) are estimated. If only a few principal components are needed, the algorithm will never need to deal with any large matrix other than A itself.

Approximating the Largest Eigenvalues in a Matrix

Suppose matrix A is a (symmetric) positive semidefinite m \times m matrix for which we need to find approximations to the k largest eigenvalues and their eigenvectors. The basic technique described in [Halko2010] is as follows:

  • Create a (“tall and slender”) m \times k matrix \Omega which contains elements from the Standard Normal Distribution (normally distributed with mean 0 and variance 1). Presumably, dimension k is small enough so that this matrix fits comfortably into (random-access) memory.

  • Create the m \times k matrix Y as the matrix product A\Omega.

  • Find the set of orthonormal vectors which take up the same space as Y, using, for instance, the QR approach or the Gram-Schmidt procedure. Call the matrix containing this set Q.

  • Now, compute B = Q'AQ and find the eigenvalues and eigenvectors of B. Since B is only k \times
k and fits very easily into (random-access) memory, this is a much more straightforward and quickly-solved problem than that of finding the eigenvalues and eigenvectors of A itself.

    Call the matrix of resulting eigenvectors U and the matrix containing the eigenvalues on its diagonal (and zeros elsewhere) \Lambda, so that we have

    B = U\Lambda U' .

  • Now, we approximate A as

    A \approx QU \Lambda U'Q' = QBQ' .

  • To increase accuracy, [Halko2010] suggests modifying this algorithm as follows (the “power method”):

    • Follow the first three steps above to create a Q matrix, call this Q_1.

    • Create matrix Y_2 = A Q_1, then find the set of orthonormal vectors that take up the same space as Y_2, and call this Q_2.

    • Repeat the above step q times, to get Q_q.

    • Now, find the eigenvalues and eigenvectors of B = Q_q' A
Q_q and follow the last two steps above to approximate A.

  • From equation (6.5) of [Halko2010], we see that the accuracy of either of these two methods (the original method or the power method) is

    \left\| A - Q_q U \Lambda U' Q_q' \right\| \lessapprox (k m)^{1/(2q)} \lambda_{k+1} ,

    where \lambda_{k+1} is the (k+1) - st - largest eigenvalue of A, and for the original method, we take q = 1.

  • As an alternative method of increasing accuracy, which may be used at the same time as the “power method” described above, the authors of [Halko2010] suggest using “oversampling”, meaning that to obtain the k largest eigenvalues and their eigenvectors, use a (“tall and slender”) matrix \Omega of dimensions m
\times k_{ov}, where k_{ov} > k. (This will make matrix Q, and matrices Q_1, etc., if used, to also be of dimensions m \times k_{ov}.) Normally, k_{ov} does not need to be greater than 2k.

  • Since the matrix \Omega is somewhat like a “memory buffer”, call the “width” k_{ov} of the “tall and slender” m
\times k_{ov} matrix \Omega the “buffer width”.

For GBLUP using EMMA: How Large Will the Buffer Width Be?

For GBLUP using EMMA, Golden Helix SVS attempts to use

k_{ov} = 46 \sqrt{m}

as the ideal buffer width. This is an optimum width based on preventing the calculations of the Gram-Schmidt step from taking excessive and unnecessary time.

If there is insufficient RAM memory to fit this would-be buffer width, the buffer width k_{ov} is reduced sufficiently to allow fitting of the calculations into RAM memory.

For PCA: How Large Will the Buffer Width Be?

For PCA, Golden Helix SVS uses the maximum of

  • A minimal width for PCA computation, namely

    k_{ovPCA} = 2 p + 15

vs.

  • An optimal width based on the number of samples m, namely

    k_{ovOpt} = \frac{46 \sqrt{m}}{200} = 0.23 \sqrt{m} .

In the unusual event that there is insufficient RAM memory for this would-be buffer width, the buffer width k_{ov} is reduced sufficiently to allow fitting of the calculations into RAM memory.

Note that if only a small number p of principal components are requested, the buffer width used will be far smaller than what would have been used for the GBLUP EMMA algorithm, thus saving calculation time.

For GBLUP using EMMA: Finding All the Eigenvalues of Matrix S(K+I)S

The technique in Approximating the Largest Eigenvalues in a Matrix works reasonably well for a matrix whose eigenvalues descend in value quickly, or reach almost zero quickly. However, a typical kinship matrix will have many or most eigenvalues in the general order of a magnitude of one. Similarly, the non-zero eigenvalues of the matrix we are interested in, namely S(K+I)S, will often range from a little over 1 to over 5 or more. (The number of zero eigenvalues is exactly equal to one more than the number of covariates you have specified.) This renders the above [Halko2010] technique only marginally useful by itself.

A further difficulty is that the EMMA algorithm needs the estimates of the smaller non-zero eigenvalues and their eigenvectors of S(K+I)S to be essentially as precise as the estimates of the larger eigenvalues and their eigenvectors. This is because for each non-zero eigenvalue, part of the algorithm divides by an expression derived from that eigenvalue, which expression may come close to zero. (That expression is the eigenvalue minus one plus the value of \delta. \delta may become as small as 10^{-5}.) (See Finding the Variance Components.)

Therefore, in Golden Helix SVS, we start with the following steps to enhance the [Halko2010] technique for GBLUP using EMMA:

  • Start with the original kinship matrix K.

  • Create the (postive definite) matrix \alpha_1 I + S(K +
\alpha_2 I)S, where S = I - X(X'X)^{-1}X', and where, in SVS, we use the value .25 for both \alpha_1 and \alpha_2. Call this matrix A.

    Note

    We avoid actually storing matrix A by itself anywhere, but we simply use large-data techniques to perform the entire operation of finding \alpha_1 I + S(K + \alpha_2
I)S with a minimum of memory usage.

    Note

    Mathematically speaking, those eigenvalues of \alpha_1 I + S(K + \alpha_2 I)S equal to \alpha_1 correspond to, and have the same eigenvectors as, the zero eigenvalues of S(K+I)S, while each of the remaining eigenvalues of \alpha_1 I + S(K + \alpha_2 I)S may be converted to the corresponding remaining eigenvalues of S(K+I)S, for which the corresponding eigenvectors are the same, by adding (1 - \alpha_1 - \alpha_2).

For PCA: Finding the Requested Eigenvalues of the Wishart Matrix

The technique in Approximating the Largest Eigenvalues in a Matrix works reasonably well for a matrix whose eigenvalues descend in value quickly, or reach almost zero quickly. However, a typical Wishart matrix may have eigenvalues that descend in value more slowly, or perhaps quickly at first followed by descending slowly. This renders the above [Halko2010] technique only marginally useful by itself.

Additionally, some eigenvalues of the Wishart matrix may be zero in certain use cases.

Therefore in Golden Helix SVS, to achieve a reasonable level of accuracy without spending excessive compute time, and to guard against possible zero eigenvalues wrecking the algorithm, we start with the following steps to enhance the [Halko2010] technique:

  • Start with the original Wishart matrix (say, W).

  • Create the (postive definite) matrix \alpha_1 I + W, where in SVS, we use the value 0.0002 for \alpha_1. Call this matrix A.

    Note

    We avoid actually storing matrix A by itself anywhere, but we simply use large-data techniques to perform the entire operation of finding \alpha_1 I + W with a minimum of memory usage.

    Note

    Mathematically speaking, those eigenvalues of \alpha_1 I + W equal to \alpha_1 correspond to, and have the same eigenvectors as, any zero eigenvalues of W, while each of the remaining eigenvalues of \alpha_1 I + W may be converted to the corresponding remaining eigenvalues of W, for which the corresponding eigenvectors are the same, by subtracting out \alpha_1.

Further Processing Steps to Find One Batch of Eigenvalues of Matrix A

  • Use the power-method-with-oversampling variation of the above-mentioned [Halko2010] procedure on matrix A. Use matrices \Omega and Q of width k_{ov}, where k_{ov} is set according to For GBLUP using EMMA: How Large Will the Buffer Width Be? or For PCA: How Large Will the Buffer Width Be?. This setting will be reasonably optimal but still small enough to allow \Omega and Q to fit comfortably into (random-access) memory.

    However, use the following procedure to obtain the best accuracy with minimum effort from this method:

    • Follow the first three steps of Approximating the Largest Eigenvalues in a Matrix (create \Omega, create Y = A\Omega, and orthonormalize) to create a Q matrix, which we call Q_0.

    • Create matrix Y_1 = A Q_0. However, do not yet orthonormalize Y_1.

    • Now, form B_0 = Q_0' A Q_0 = Q_0' Y_1, and find just the eigenvalues of B_0.

    • Then, orthonormalize Y_1 to get Q_1, form Y_2 = A Q_1, form B_1 = Q_1' A Q_1 = Q_1' Y_2, and obtain the eigenvalues of B_1.

    • If enough of the highest eigenvalue estimates have changed slowly enough between those of B_0 and those of B_1, accept these estimates as being the highest eigenvalues of both matrix B_1 and matrix A. “Enough” and “slowly enough” are explained in the next bullet below. Otherwise, repeat the step above followed by this step, except compute Y_3 from Q_2 using an extra scan through the matrix A, as Y_{3a} = A Q_2 and Y_3 = A Y_{3a}, and so on for Y_4, Y_5, etc.

    • To explain the words “enough” and “slowly enough” above: After each iteration r of the step where B_r is formed, compare the magnitudes of the differences between eigenvalue estimates between step r and step r-1, starting with the largest eigenvalue and working through the smaller eigenvalues. As soon as the magnitude of the difference between two estimates of a given eigenvalue is larger than a small \epsilon, stop comparing eigenvalue estimates.

      • For GBLUP using EMMA: You may specify 10^{-3}, 10^{-5}, or 10^{-7} as the \epsilon. (See the table shown in For GBLUP using EMMA: Large N Performance Tradeoffs.)

      • For PCA: 10^{-7} is always used as the \epsilon. (Also, the Max Iterations per Batch (q) is always 60. You might call this mode “HyperSlow”.)

      Call the number of eigenvalues of B_r for which this comparison passed k_r (where k_0 = 0).

    • When the “batch stopping criteria” are met, stop iterating. The meaning of “batch stopping criteria” is explained in the next two sections.

    • For succeeding “batches” of eigenvalues, this procedure is repeated for A - UDU', A - UDU' - U_2D_2U_2', etc.

For GBLUP using EMMA: Batch Stopping Criteria

If r \ge 1 and k_r (r-1) < k_{(r-1)} r, do no further iterations, and “accept” the largest k_{final} = k_r eigenvalues from iteration r.

Note

“Accepting” k_{final} eigenvalues from r iterations is equivalent to finding k_{final} eigenvalues using a “power method” with q=r extra passes and an “oversampling” parameter of k_{ov}.

Note

The idea behind the stopping criterion is that if r >
1 and \frac{k_r}{r} \ge \frac{k_{(r-1)}}{(r-1)}, we see the average number of eigenvalues per iteration has increased, and thus it must have been worthwhile to have the current iteration, and it presumably will be worthwhile to continue iterating.

Additionally, no matter what the value of r, stop iterating if k_{ov} - f eigenvalues have been estimated satisfactorily, where f is one more than the number of covariates you have specified, and “accept” all of the largest k_{final} = k_{ov} -
f eigenvalues.

Note

It may turn out that the above algorithm, if followed strictly, would not stop until after a very high number of iterations r, or not even stop at all. Alternatively, it may stop after only a very tiny number of eigenvalues are “accepted”.

For these reasons, when using this large-data EMMA algorithm, SVS allows you to specify the limit (q) for the number of iterations r. In addition, SVS imposes a maximum oversampling ratio ((k_{ov}/k)_{max}) of two.

The meaning of these restrictions is that in the event that SVS reaches q iterations, this part of the algorithm is stopped and k_{final}=k_{ov}/2 eigenvalues are “accepted”, no matter what.

Also, when the algorithm stops earlier (r < r_{ov}), at least k_{final}=\frac{rk_{ov}}{q(k_{ov}/k)_{max}} =
\frac{rk_{ov}}{2q} eigenvalues are “accepted”. Hopefully, this will be fewer than the number of eigenvalues k_r that the algorithm has already (tentatively) “accepted”. If not, this will have the effect of pushing the algorithm forward at the minimum rate of \frac{k_{ov}}{2q} per iteration.

With these restrictions (maximum power-method iterations and maximum oversampling ratio), a maximum execution time can be guaranteed.

For PCA: Batch stopping Criteria

The normal use case for PCA is:

  • If k_r is greater than or equal to the number p of eigenvalues requested, and we are working on the first batch (which processes directly from matrix A), stop all iteration using the [Halko2010] method and “accept” the greatest p eigenvalues which have been computed as the final answer.

    As stated above, this is considered to be the normal use case for PCA, because (1) only a few eigenvalues are requested, so that the number of eigenvalues requested is less than the buffer width k_{ov}, (2) the algorithm converges for all eigenvalues requested before the maximum of 60 iterations is reached, and as a result, (3) only the very first batch derived directly from matrix A needs to be processed.

Another use case may sometimes happen:

  • Suppose (1) convergence turns out to be more difficult than would normally happen and thus the algorithm failed, after 60 iterations, to find all requested eigenvalues within the normal precision within the first batch, but meanwhile, (2) the buffer width k_{ov} is more than twice the number of requested eigenvalues p.

    In this case, the largest p eigenvalues are simply “accepted” as the final answer.

If, in the rare case that due to either memory considerations or because a large number of components p were requested, the buffer width k_{ov} is less than 2p or even less than p, multiple batches may be necessated, which will result in these use cases:

  • If, on any batch, the number of eigenvalues found k_r plus the number of all previously found eigenvalues p_p is greater than or equal to the number p of eigenvalues requested, stop the algorithm and “accept” the greatest p -
p_p eigenvalues from this batch.

  • On any batch, if all possible batch eigenvalues have been found (k_r = k_{ov}), “accept” these k_r eigenvalues and start on the next batch.

  • On any batch, when 60 iterations have been reached but k_r plus the number of all previously found eigenvalues is still less than the number p of eigenvalues requested, stop working on the current batch, “bolster up” the working value of k_r to at least k_{ov} / 2, and “accept” the latest estimates of the greatest k_r eigenvalues.

Call the number of eigenvalues found or accepted when the current batch has stopped k_{final}.

Processing to Finish a Batch

Now that we have “accepted” k_{final} eigenvalues from iteration r of the algorithm above, compute the eigenvectors V_r of B_r. Designate the matrix with the diagonal containing the highest k_{final} of the eigenvalues of B_r (the ones that we “accepted”) to be \Lambda_{Br1}, and the eigenvectors corresponding to these “accepted” eigenvalues V_{r1}. Use these to approximate the highest k_{final} eigenvalues of A (diagonal of matrix \Lambda_1) and their corresponding eigenvectors U_1 as

\Lambda_1 = \Lambda_{Br1}

and

U_1 = Q_r V_{r1} .

Since matrix A may be written as

A = U_1 \Lambda_1 U_1'  +  U_2 \Lambda_2 U_2' ,

where \Lambda_2 and U_2 represent the remaining eigenvalues and eigenvectors after \Lambda_1 and U_1, consider matrix A as decomposed into

A = Q_r V_{r1} \Lambda_{Br1} V_{r1}' Q_r'  +  U_2 \Lambda_2 U_2' .

For GBLUP using EMMA: When to Stop the Whole Algorithm

If m - f eigenvalues and eigenvectors have been obtained (for matrix A, and thus for matrix S(K+I)S), stop the algorithm. This many eigenvalues and eigenvectors are all that are required by the EMMA technique.

For PCA: When to Stop the Whole Algorithm

As stated above:

  • If we are working on the first batch (which processes directly from matrix A), and k_r is greater than or equal to the number p of eigenvalues requested, stop all iteration using the [Halko2010] method and “accept” the greatest p eigenvalues which have been computed.

  • If, for some reason, the algorithm needs to work on succeeding batches after the first batch, it is the number of eigenvalues found k_r plus the number of all previously found eigenvalues p_p which needs to be greater than or equal to the number p of eigenvalues requested in order to stop the algorithm and “accept” the greatest p - p_p eigenvalues from this batch to finish finding all p eigenvalues.

If a New Batch Needs to Be Initiated:

  • Use large-data techniques to form a new matrix A by subtracting off the first part of our decomposition, as

    A_{new} = A_{old} - Q_r V_{r1} \Lambda_{Br1} V_{r1}' Q_r' .

    Then, repeat the above variation of the power method along with this step.

    Note

    Because the original matrix A was formed from the actual (positive semidefinite) matrix we are interested in plus \alpha_1 I, where \alpha_1 is a smallish positive number, performing the above subtraction will leave behind all of the remaining eigenvalues and eigenvectors of matrix A, while eigenvalues of A_{new} corresponding to the eigenvectors that we have just found will essentially be zero, thus keeping them out of the way of later calculations.

  • If m minus the number of eigenvalues thus far obtained is less than k_{ov}, appropriately reduce the width of the \Omega and Q matrices.

    Note

    In addition, under this circumstance, when the algorithm stops, whether or not q iterations are reached, all remaining eigenvalues (out of (m-f) eigenvalues or, for PCA, out of the remaining requested eigenvalues) are “accepted”, because all the information related to the remaining (non-zero) eigenvalues should have been effectively contained in the \Omega and Q matrices.

For GBLUP using EMMA: Large N Performance Tradeoffs

As mentioned earlier, when dealing with many samples, computation of eigenvalues and eigenvectors for GBLUP (Performing GBLUP Analysis) takes much longer. Thus, when using large-data mode for GBLUP (and for K-Fold cross validation using GBLUP), there are approximations that have to be made to gain speed and computability in favor of accuracy. Therefore, when you use either of these features with more than 8,000 samples, and before you are given the standard options relating to the feature, you are given a choice of which large-data approach to use from the following set of options:

Mode

Approach

Max Iterations per Batch

Early Exit Precision

Slow

Large N

30

10^{-7}

Medium

Large N

12

10^{-5}

Quick

Large N

5

10^{-5}

Quickest

Large N

2

10^{-3}

Exact

Small N

N/A

N/A

“Max Iterations per Batch” refers to the maximum number q of iterations to use (see the fifth note in For GBLUP using EMMA: Finding All the Eigenvalues of Matrix S(K+I)S), and “Early Exit Precision” refers to the tolerance to use (see the bullet before the third note in For GBLUP using EMMA: Finding All the Eigenvalues of Matrix S(K+I)S).

Note

If you separately compute a Genomic Relationship Matrix (Separately Computing the Genomic Relationship Matrix) from many samples, SVS will automatically go into large-data mode, but still compute the exact Genomic Relationship Matrix in a reasonable amount of time.

Summary of Performance Tradeoffs

  • If you have more than 8,000 samples (5,500 samples for 32-bit systems), you will be asked to choose your data analysis mode before starting analysis of either GBLUP or of K-Fold cross validation using GBLUP.

  • Before entering your project and starting your calculations, set the “Transpose and analysis memory usage” to the highest memory level that you have available (Global Product Options), ideally to somewhat more than the ideal memory usages documented in Large N Memory Considerations. This will insure the best accuracy when you use Large Data mode.

  • For only moderately more than 8,000 samples, use the “Medium” or “Slow” Large N mode. Alternatively, if you have sufficient RAM, and you are willing to wait for eigenvalues to be computed (may be several minutes) without being able to interrupt these computations, you could use the “Exact” (Small N) mode (Computation Time).

  • For a medium number of samples, such as 15,000, using the “Quick” or “Quickest” method will bring the computation time down. Also, consider using the “Exact” method if you have sufficient RAM and can wait for the eigenvalue computations (see above bullet), or using the approach (mentioned in the next bullet) of basing the kinship matrix on a subset of markers containing many fewer markers than the number of samples you have.

  • For large numbers of samples, you should definitely consider using a kinship matrix based on just one chromosome, or on a marker subset created from LD pruning. (See Alternative Approaches.)

  • If you get messages in the spreadsheet logs starting with “NOTE:” that count how many eigenvalues were approximate or were rounded up, and there were a significant number of eigenvalues that had to be approximated in this way, consider using a “more slow” method (more maximum iterations per batch) to improve accuracy, or if possible, using more memory for the calculations (see the bullet above about memory usage).

Computation Time

With a 64-bit system you will always want to run GBLUP (and K-Fold cross validation using GBLUP) using the Exact method if enough RAM is available, and you are willing to wait for eigenvalues to be computed without being able to easily stop or cancel out from the process. We have been able to process kinship matrices from 54,000 samples using the Exact method on a computer with 128 GB of RAM. At a certain point (yet undetermined), having more RAM will not make a difference in whether or not you can use the Exact (small N) method, as systems will not allow it to be used for the algorithm.

Using a 48k simulated marker dataset, tests were performed with varying sample sizes using the above five modes. On a computer with 8 cores, 8 GB of RAM and a CPU speed of 3.1 GHz, the following table illustrates the time required for the sample sizes listed.

# Samples

Slow

Medium

Quick

Quickest

Exact (Small N Algorithms)

2k

~1 min

1 min

~1 min

~1 min

~1 min

4k

~10 min

7 min

5 min

3 min

~2 min

8k

76 min

80 min

32 min

19 min

~5 min

20k

1133 min / ~19 hrs

823 min / ~14 hrs

469 min / ~8 hrs

234 min / ~4 hrs

~57 min

40k

Not computed

Not computed

3556 min / ~59 hrs / ~2.5 days

2042 min / ~34 hrs / ~1.5 days

~5796 min / ~4 days

Based on the data accumulated, the quickest method, as best-fit by a cubic distribution, should finish 100,000 samples in roughly 26 days. A faster computer with more cores could speed this computation time up.

Precision of Results

Precision was measured by taking the variance of the difference between methods divided by the total variability. There was no difference between the slow method and the exact method for all sample sizes computed (up to 20k). Variability started to be introduced with the Medium method and four thousand samples, but the variability was almost negligible. The quickest method had the most variability for each sample size computed except for the 2000 sample set. Of the results output, the Random Effects Components for each sample and the Normalized Absolute ASE values are those most affected by the compromises made to obtain speed increases.

The method used for these precision computations is the ratio of the measurement system variability to total variability (\rho_M),

\rho_M = \frac{\sigma^2_{Gauge}}{\sigma^2_{Total}}

where \sigma^2_{Gauge} = \large(\frac{\bar{R}}{d_2} \large)^2 and \bar{R} = mean(|measurement_1-measurement_2|_{Nx1}) and d_2 = 1.128 because only two measurements are being compared, and where \sigma^2_{Total}=s^2 where s is the sample standard deviation of the averages of the two measurements (\bar{x}).

Determining Influential Markers

Of the top 100 markers from the Exact method compared to the top 100 markers for the Quickest method in the 20k sample set, all but three overlap. Thus, it is reasonable to assume that a prediction model based on the top markers from the quickest method would result in a very similar conclusion to that built on the exact method. More work should be done in confirming this hypothesis.

Large N Memory Considerations

Large-data-mode GBLUP will automatically detect how much “Transpose and analysis memory usage” you have told SVS is available, and will consider that amount to be available for Large-data-mode GBLUP analysis (Global Product Options).

To insure the most accuracy, you should allow SVS to use as much memory as possible, up to its upper limit, which is

1104 (N^{3/2})

bytes of memory for its computations, where N is the number of samples.

For the following sample sizes, this formula translates to:

# Samples

Memory

8k

~790 MB

20k

~3.1 GB

40k

~8.8 GB

100k

~34.9 GB

This limitation is to keep in check the time spent computing one of the steps of the process (namely the Gram-Schmidt orthonormalization step). However, this amount of memory is also an ideal size for these computations.

If this amount of memory is not available, the memory that is available will be used. However, this will impose percentage limitations on the amount of memory that will be used in the following cases, compared with the ideal memory:

# Samples

256 MB

512 MB

1 GB

8 GB

16 GB

8k

32%

65%

(100%)

(100%)

(100%)

20k

8%

16%

32%

(100%)

(100%)

40k

3%

6%

11%

91%

(100%)

100k

.7%

1.5%

3%

23%

46%

As stated above, having more memory available for Large-N-mode GBLUP will, in general, improve the accuracy that will be obtained.

Meanwhile, the amount of memory used will not appreciably affect the maximum possible time of execution, one way or the other, because this amount simply trades off how many eigenvalues are computed per top-layer iteration vs. how many top-layer iterations may be required at a maximum.

Comparing Machines

The Large N implementation relies heavily on the ability of the NumPy and SciPy libraries for Python to use multiple threads. This means that the Large N method must be run on a Windows machine to perform the computation in a reasonable amount of time.

This table shows run times for GBLUP using the “Quickest” mode on two different systems:

# Samples

Windows 8 Core 8 GB RAM

Linux Server

2000

0.95 min

5.4 min

4000

3.5 min

44 min

8000

19 min

217 min

Alternative Approaches

One suggested alternative approach has been to split up the analysis in a Chromosome by Chromosome manner. However, the bulk of the computation is performed on the sample by sample dimension regardless of the number of markers in the dataset. This means that for a given kinship matrix for 100,000 samples, the amount of time it will take to perform the analysis will roughly be the same for five thousand markers, 48 thousand markers or one million markers.

However, the computation needed to process a kinship matrix which is a GBLUP Relationship Matrix formed from many fewer markers than samples need not be as much, since, in this case, many or most of the non-zero eigenvalues of S(K+I)S are the minimum possible value of one. The Quick method can process these minimum-possible eigenvalues up to seven times faster than would otherwise normally be possible with the Quick method, while larger performance gains are possible with the Medium and Slow methods for these minimum-possible eigenvalues. Therefore, for instance, if the kinship matrix is based on 40,000 samples and 10,000 markers, processing the entire kinship matrix with the Quick method should take about one third the time it would for the Quick method to process a kinship matrix based on 40,000 samples and 40,000 markers.

To take advantage of this fact, a good alternative approach is to simply base the kinship matrix on one chromosome (to get a reasonable amount of information about the inheritance pattern) and perform the analysis on all of the markers using this kinship matrix.

Another alternative approach, which may be combined with the approach to base the kinship matrix on only one chromosome, is to base the kinship matrix on a subset of markers obtained by LD Pruning (LD Pruning).

Note

It is important to set the largest possible memory size that you have available in your system for “Transpose and analysis memory usage” (Global Product Options), up to and past the upper limits mentioned in Large N Memory Considerations, in order to have the best possible accuracy for these calculations, and to insure the best likelihood that this optimization actually takes place.