New in SVS 7 are several quality control metrics to control for poor quality SNPs and samples. This tutorial focuses specifically on a new option under the Quality Assurance menu called PBAT Family-Based QC, which enables the detection of Mendelian errors and samples with overall poor genotype quality.
Note
Though not covered in this tutorial, it is still appropriate to apply other non-family-based quality assurance metrics to exclude poor quality samples and markers from analysis. Several additional options are available under the Quality Assurance menu from a spreadsheet.
Upon completion a new spreadsheet is created, PBAT QA Results (Figure 3a), with various quality control statistics. In this tutorial we’ll focus on removing SNPs that have one or more Mendelian errors.
This will inactivate all the rows where there are Mendelian errors. We will use the active rows in this spreadsheet to activate their respective columns in the CEU All - Sheet 1 spreadsheet.
This will create a new spreadsheet, CEU All - Sheet 2, with 19,090 active columns.
The latest version of PBAT incorporates a novel test that assesses the genotyping quality of individual probands in family-based association studies. Published in PLoS Genetics [Fardo, 2009] these tests are “ideally suited as the final layer of quality assurance filters in the cleaning process of genome-wide association studies.”
Another PBAT QA Results spreadsheet is created (Figure 3b.), this time with quality control metrics for each proband. In the paper cited above, Fardo et al. suggests that, on a genome-wide scale, probands with a score greater than 30 are considered to have poor genotyping quality.
Notice there are 5 samples with a Tgw value greater than 30. However, this particular dataset only contains genotypes for chromosome 22 so the statistics reported do not necessarily translate to a whole genome scale. Therefore, for this tutorial we will not exclude any samples.