1. Data Preparation

In order to run PBAT in SVS 7 you need, at minimum, a spreadsheet containing pedigree information (including Family ID, Patient ID, Mother ID, Father ID, Sex, and Affection Status) and genetic data (either genotypes or continuous variables, such as log ratios). A fundamental change from previous versions of Golden Helix PBAT is how phenotype information is handled. In order to access phenotype data in SVS 7, you first need to join it with your pedigree and genetic data. The following step leads you through importing each data type separately and then merging into a single spreadsheet.

A. Import Pedigree Information

Before you can begin you need to create a new project.

  • Open SVS and from the Welcome Screen select File >New Project.
  • Name the project PBAT Tutorial, browse to a directory where you want the project saved and click New Project. This will open the Project Navigator.
Figure 2a. Pedigree spreadsheet

Figure 2a. Pedigree spreadsheet.

The first file to import is CEU - PED.csv contained within the downloaded zip file. This is a comma-delimited CSV file with pedigree information for the CEU HapMap samples (Phase III).

  • Select Import >PBAT >Text Pedigree.
  • Browse to the directory where you saved CEU - PED.csv and click Open.
  • Under Row Labels select Use column number: 1.
  • Choose the radio buttons that correlate to Sex and Affection Status as being ...encoded as 0/1/2 (or ?/1/2).

Note

If the default options (?/0/1) are used, the spreadsheet will not be recognized as a pedigree spreadsheet.

  • Click OK.

This will create a new pedigree spreadsheet called CEU - PED Pedigree Dataset - Sheet 1 (Figure 2a).

Note

Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for pedigree columns at the front of the spreadsheet. If your imported spreadsheet has neither of these, it will not be recognized as a pedigree spreadsheet and certain analysis options will not be present.

B. Import Phenotype Information

Figure 2b. Simulated phenotype spreadsheet

Figure 2b. Simulated phenotype spreadsheet

Next you need to import CEU - SIM - PHENO.csv. This is a comma-delimited CSV file with simulated phenotype information. It is used for demonstration purposes only.

  • From the Project Navigator select Import >Text.
  • Browse to the directory where you saved CEU - SIM - PHENO.csv and click Open.
  • Leave the rest of the parameters as defaults and click OK.

This will create a new spreadsheet called CEU - SIM - PHENO - Dataset - Sheet 1 (Figure 2b).

C. Import Genotypes

Figure 2c. Genotype spreadsheet

Figure 2c. Genotype spreadsheet.

Last, you need to import CEU - GENO - Chr22.dsf. This file contains actual genotypes on chromosome 22 for the CEU samples, which were generated by a combination of Affymetrix and Illumina platforms.

  • From the Project Navigator select Import >Golden Helix DSF.
  • Browse to the directory where you saved CEU - GENO - Chr22.DSF and click Open.
  • Leave the rest of the parameters as defaults and click OK.

This will create a new marker mapped spreadsheet called CEU - GENO - Chr 22 - Sheet 1 (Figure 2c).

D. Merge Spreadsheets

Now that you have all three spreadsheets in the project you need to join them together. When joining spreadsheets it doesn’t matter which one you start from. However, if there is certain data you want located toward the front of your spreadsheet for easier viewing (e.g. phenotype data) you will want to initiate the join from that spreadsheet. When pedigree data is available (and denoted as such) this information will always be the first few columns of the spreadsheet.

  • Open CEU - PED Pedigree Dataset - Sheet 1 and select File >Join or Merge Spreadsheets.
  • From the spreadsheet chooser select CEU - SIM - PHENO - Dataset - Sheet 1 and click OK.
  • Enter PED + PHENO for New dataset name:.
  • Under Spreadsheet as Child of choose Current Spreadsheet.
  • Leave all other parameters as the defaults and click OK.

This will create a new spreadsheet PED + PHENO - Sheet 1. Now join this one with the genotype spreadsheet.

  • From PED + PHENO - Sheet 1 select File >Join or Merge Spreadsheets.
  • Select CEU - GENO - Chr22 - Sheet 1 and click OK.
  • Enter CEU All for New dataset name:.
  • Under Spreadsheet as Child of choose Project root.
  • Leave all other parameters as the defaults and click OK.

You now have all the data in one spreadsheet, CEU All - Sheet 1, and are ready for analysis.

Note

In addition to performing family-based association testing using genotypes as covariates you can also perform association with various CNV covariates. Though not covered in this tutorial, you would go about PBAT CNV Analysis in the same manner as PBAT Genotype Analysis, though instead of joining a genotype spreadsheet with your pedigree and phenotype information, you would join your CNV data. To learn more about processing CNV data, see the Copy Number Variation (CNV) Analysis Tutorial.