In practice there are several data sources upon which to base CNV association tests, including the Log Ratios, PCA corrected LRs, the mean LR segmentation covariates and the two-state or three-state discretized segment covariates. This tutorial focuses on using the discretized segment covariates for association testing, though the process is similar for the others.
A. CNV Association Analysis
Before running association tests we first need to join the phenotype information with the discretized segment covariates. Again, the phenotype information in this tutorial is simulated for demonstration purposes.
- Open the Sim_Pheno - Final Sample Set spreadsheet and select File >Join or Merge Spreadsheets.
- Select the Three State Covariates spreadsheet and click OK.
- Leave the default options in the Join or Merge Spreadsheet window and click OK.
The new spreadsheet, Sim_Pheno Dataset + Three State Covariates – Sheet 1, will be used for CNV Association tests.
- From the resulting spreadsheet, left-click the Phenotype 1 column label header once to turn the column magenta, denoting it as the dependent variable.
- Select Analysis >Numeric Association Tests.
- For this tutorial set the same parameters to those in Figure 3-1 and click Run.
The resulting Association Tests spreadsheet contains four columns. To view the most significant markers, sort the Corr/Trend P column ascending.
- Right-click on the Corr/Trend P column and choose ‘Inactivate Missings’. Create a subset by choosing Select >Subset Active Data.
- In Association Tests - Active Subset right-click on the Corr/Trend P column (1) and select Sort Ascending.
SNP_A-2186409 is the most significant marker with a p-value of about 0.00075...(Figure 3-2). It does not meet a Bonferroni correction (column 4). Keep in mind that the Bonferroni correction is based on the total number of tests performed, which includes a lot of redundant tests due to slight inconsistencies in the start and endpoints of common CNV segments, as revealed in the visualization steps below.
- To plot these results, right-click on the Corr/Trend –log10P column header and select Plot Variable in Genome Browser.
The resulting plot looks like Figure 3-3. Notice the most significant peak on chromosome 22.
It may be desirable to visualize the underlying data in conjunction with the association test results. The best way to do this is with a heat map of the segment covariates, sorted by phenotype status.
- Open the Sim_Pheno – Final Sample Set spreadsheet. Follow the steps outlined above to merge this spreadsheet with the Segmentation Covariates – Every Column spreadsheet. Enter Sim_Pheno Dataset Final Sample Set + Segmentation Covariates Every Column as the new dataset name.
- Return to the P-value plot.
- Click on User Graphs. Under the Add Graph tab, choose From: Select Spreadsheet. In the resulting menu, choose Sim_Pheno Final Sample Set + Segmentation Covariates Every Column - Sheet 1 and click OK.
- The only choice in the Add Graph tab now should be Heat Map. Select this option and click Add. A heatmap of the segmentation covariates will be drawn below the p-value plot.
- Click once on Sim_Pheno Dataset + Segmentation Covariates Every Column - Sheet 1 under User Graphs, which will open the heatmap controls. Under the Color tab, select Manual and Set CNV Defaults.
- Click on the Group tab and click Select Variable. Choose Phenotype from the list and click OK.
- Double-click on chromosome 22 in the Full Domain View to zoom to that chromosome, and zoom to the region around the p-value peak by clicking and dragging on the x-axis of the p-value plot.
This concludes the association testing aspects of this CNV study. The remaining step outlines some additional visualization and plotting techniques for CNV data.