Welcome to the Genomic Prediction with K-Fold Tutorial!ΒΆ


Updated: December 6, 2018

Level: Advanced

Version: 8.7.0 or higher

Product: SVS

K-Fold cross validation is a model validation technique that can be used to assess how well a model can predict a phenotype for a given independent dataset. To evaluate our model, we use samples for which we have both genotype and phenotype data.

To prepare for k repetitions of the cross-validation procedure (“K-Fold”), SVS partitions our data into k subsets of samples, which we shall call subsamples. During each repetition, or “fold”, one subsample is selected to be the validation set while the model is fitted to the other k - 1 subsamples. Using the combined results from the k repetitions we can analyze how well the model was able to predict the phenotypes for our dataset and how well it can predict an independent data set.

Subsamples are sampled with either simple random sampling or, if there is a subpopulation grouping present, stratified random sampling.

Using a Soybean dataset, this tutorial will lead you step-by-step through the workflow for determining whether a prediction model is appropriate.


This tutorial will not cover the data importing and quality control procedures that were used. To learn more about the various quality control tools available in SNP & Variation Suite, please refer to the manual.


To complete this tutorial you will need the following:

We hope you enjoy the experience and look forward to your feedback.