2. Quality Assurance

New in SVS 7 are several quality control metrics to control for poor quality SNPs and samples. This tutorial focuses specifically on a new option under the Quality Assurance menu called PBAT Family-Based QC, which enables the detection of Mendelian errors and samples with overall poor genotype quality.

Note

Though not covered in this tutorial, it is still appropriate to apply other non-family-based quality assurance metrics to exclude poor quality samples and markers from analysis. Several additional options are available under the Quality Assurance menu from a spreadsheet.

A. Quality Control by Marker

Figure 3a. PBAT QA Results by marker

Figure 3a. PBAT QA Results by marker

  • Open CEU All - Sheet 1 and select Quality Assurance >PBAT Family-Based QA.
  • Under Computation parameters check Use alternative rapid pedigree algorithm. This option needs to be checked in order for PBAT to report Mendelian errors.
  • Under Output choose Output by marker.
  • Leave all parameters as the defaults and click Run.

Upon completion a new spreadsheet is created, PBAT QA Results (Figure 3a), with various quality control statistics. In this tutorial we’ll focus on removing SNPs that have one or more Mendelian errors.

  • Right-click the Mendelian errors column and select Activate by Threshold.
  • Select <= 0 and click OK.

This will inactivate all the rows where there are Mendelian errors. We will use the active rows in this spreadsheet to activate their respective columns in the CEU All - Sheet 1 spreadsheet.

  • Open CEU All - Sheet 1 and select Select >Activate or Inactivate based on Second Spreadsheet.
  • Choose Columns in the first drop down box, then choose the PBAT QC Results spreadsheet at the bottom of the dialog.
  • The dialog should read; Set the state of Columns in the current spreadsheet to Active based on active Rows in the specified spreadsheet PBAT QA Results (by Marker). Click OK.

This will create a new spreadsheet, CEU All - Sheet 2, with 19,090 active columns.

B. Quality Assurance by Sample

Figure 3b. PBAT QC Results by proband

Figure 3b. PBAT QC Results by proband

The latest version of PBAT incorporates a novel test that assesses the genotyping quality of individual probands in family-based association studies. Published in PLoS Genetics [Fardo, 2009] these tests are “ideally suited as the final layer of quality assurance filters in the cleaning process of genome-wide association studies.”

  • Open CEU All - Sheet 2 and select Quality Assurance >PBAT Family-Based QA.
  • Again, check Use alternative rapid pedigree algorithm under Computation parameters.
  • This time select Output by proband under Output and click Run.

Another PBAT QA Results spreadsheet is created (Figure 3b.), this time with quality control metrics for each proband. In the paper cited above, Fardo et al. suggests that, on a genome-wide scale, probands with a score greater than 30 are considered to have poor genotyping quality.

  • Right-click on the Tgw column header and select Sort Descending.

Notice there are 5 samples with a Tgw value greater than 30. However, this particular dataset only contains genotypes for chromosome 22 so the statistics reported do not necessarily translate to a whole genome scale. Therefore, for this tutorial we will not exclude any samples.