4. Sample QA - II: Cryptic RelatednessΒΆ

Next find and filter samples determined to be “related” to other samples. Relatedness is often defined as family-relatedness but identity by descent (IBD) estimation can also detect duplicate samples, duplicate samples from one of a pair of genotyping chips but not the other, or sample contamination. Before doing IBD, standard practice is to first prune the SNPs in LD with one another, reducing the number of association tests performed and thus the effect of multiple testing.

  • Open Subset - Samples with Call Rate >=.95 and Matched Gender.
  • Choose Quality Assurance >Genotype >LD Pruning.
  • Input 100 for Window Size, 5 for Window Increment, and 0.5 for LD r^2 Threshold, CHM for LD Computation Method, and click OK.

This will take a few minutes. Upon finishing, 286,795 markers are deactivated as designated by the grayed out columns in the spreadsheet. You can also see in the upper-right portion of the window that only 212,469 columns are active out of 499,264. You will be using only the active columns (non-correlated markers) for autosomal chromosomes to perform IBD, so first create a column subset spreadsheet and then inactivate the X chromosome before continuing.

  • Choose Select >Column >Column Subset Spreadsheet.

  • Then from the subset spreadsheet, choose Select >Activate by Chromosomes, uncheck the X chromosome and click OK.

    Note

    In datasets with other non-autosome chromosomes, inactivate these as well.

  • In the Project Navigator rename this node to Pruned SNP Subset.

Now you are ready for IBD estimation.

  • From the Pruned SNP Subset spreadsheet, choose Quality Assurance >Genotype >Identity by Descent Estimation.
  • For this exercise, uncheck Output untransformed estimates of P(Z=0), P(Z=1), and P(Z=2), make sure that Output PI = P(Z=1)/2 + P(Z=2) is checked, and check Output all pairs where PI > ___ (enter 0), and click Run.

This will take a few minutes. Upon completion, two spreadsheets are output: IBD Estimate: Estimated PI and Pairwise IBD Estimates (PI >=0).

IBD Estimate: Estimated PI gives an N x N table where N is the number of samples in the dataset. By plotting a heatmap of this table we can detect patterns showing relatedness. Pairwise IBD Estimates (PI >=0) outputs various IBD statistics for all samples. These values can be sorted or plotted to find samples related to one another or detect sample contamination.

  • From the IBD Estimate: Estimated PI, choose Plot >Heat Map.

You’ll get the plot in Figure 4-1.

Default heat map of IBD PI estimates

Figure 4-1. Default heat map of IBD PI estimates

By default, the heat map has a three color scheme calculated automatically. We want to define the color scheme manually based on a two color scheme where we look for sample pairs with a PI estimate of 0.25 or greater (PI of 0.25 = second degree relatives, 0.5 = first degree relatives, and 1 = identical twins (or duplicate samples)).

  • Click on the IBD Estimate: Estimated PI node in the Graph Control Interface (upper-left portion of the window).
  • On the Color tab choose Manual and then right-click the middle option (0.05154762), and select Delete.
  • Right click on the first parameter, select Edit, and change it to 0.2. Then click once on the color box next to the first parameter and select white; click OK. Change the second color parameter to red. Your plot should look like Figure 4-2.
Heat map with two colors

Figure 4-2. Heat map with two colors

You can begin to see samples along the diagonal line that appear to be related to one another. You can zoom in to see this more clearly by clicking and dragging a red box in the plot area (Figure 4-3).

Zoomed in area around several related individuals

Figure 4-3. Zoomed in area around several related individuals

In most population-based studies you’ll typically find a couple sample pairs who are cryptically related in one way or another, which you can subsequently remove. In this particular study there are known to be several family trios and the cluster pattern above indicates this. For the purpose of this tutorial we won’t exclude any sample pairs here, but if this were your own study, you would need to decide which member(s) of the trio to keep or discard from the study.

Previous topic

3. Sample QA - I: Basics

Next topic

5. Sample QA - III: Population Stratification