Welcome to the Genomic Prediction with K-Fold Tutorial!ΒΆ

_images/ScatterPlotGBLUPFinal.jpg

Updated: March 22nd, 2017

Level: Intermediate

Version: 8.7.0 or higher

Product: SVS

K-Fold cross validation is a model validation technique that can be used to assess how well a model can predict a phenotype for a given independent dataset. To evaluate our model, we use samples that we have both genotype and phenotype data.

For k iterations our data is partitioned into k subsamples, one subsample is selected in each iteration to be the validation set and the model is fitted to the other k - 1 subsamples. Combining the results from the k iterations we can then analyze how well the model was able to predict the phenotypes for our dataset and how well it can predict an independent data set.

Subsamples are sampled with either simple random sampling or stratified random sampling if there is a subpopulation grouping present.

Using a Soybean dataset, this tutorial will lead you step-by-step through the workflow for determining whether a prediction model is appropriate.

Note

This tutorial will not cover the data importing and quality control procedures used. To learn more about the various quality control tools available in SNP & Variation Suite, please refer to the manual.

Requirements

To complete this tutorial you will need the following:

We hope you enjoy the experience and look forward to your feedback.