4. Data Visualization Techniques

Outlined below are additional data visualization techniques that will allow for further examination of your copy number data.

A. Heat Map of All Sample Segment Covariates

Open up the plot previously created. Deselect the pvalue plot so that only the heat map is visible.

  • Uncheck Graph 1 in the Graph Control Interface.
  • Right click anywhere in the heat map and choose Reset Zoom.

The copy number variations become more apparent with this color scheme. Since the heat map is sorted by case/control status, color-consistent streaks on either the lower or upper half of the plot could signal an association. Though at first glance there are not any obvious differences between cases and control there are a couple regions of interest. Figure 4-1 highlights large CNVs for individual samples in red, and copy variations among all the samples in blue.

Figure 4-1. Large, rare CNVs (indicated with red boxes) and shared common CNVs (blue boxes)

Figure 4-1. Large, rare CNVs (indicated with red boxes) and shared common CNVs (blue boxes)

Let’s look at one of the common CNVs first.

  • Enter Chr17:28,628,125 - 43,321,368 in the genomic region box on the top of the window.

The plot should now look like Figure 4-2 with three common CNVs shown. Notice that not every sample has the exact same starting and ending boundaries for each region. This is common when using the univariate segmentation method. The consequence is that the output spreadsheet containing the “first column” of each CNV segment will contain a substaintial amount of redundant data. As it is likely that the same feature (probably a common indel region) is being detected in each of the subjects, the overlapping area is probably the most correct representation of the underlying biology. The multivariate segmentation method will determine the most likely endpoints by comparing data across all samples and result in uniform endpoints for common CNV regions.

Figure 4-2. Three common CNV regions on chromosome 17

Figure 4-2. Three common CNV regions on chromosome 17

Let’s add the Database of Genomic Variants annotation track to see how this region has been catalogued and the Affymetrix 500K probe track to see how dense the genotyping was around these areas.

  • Click on the Annotation Tracks node in the Graph Control Interface.
  • Under the Add Network Track tab check Affymetrix 500K, GHI and CNV, DGV and click Add.

The plot now looks like Figure 4-3 (you can make the Annotation Tracks section larger by clicking and dragging up the separation bar at the bottom of the heat map).

Figure 4-3. Affymetrix 500K and Database of Genomic Variants track added to the plot

Figure 4-3. Affymetrix 500K and Database of Genomic Variants track added to the plot

Notice that these regions have been extensively cataloged by the Database of Genomic Variants. You can zoom into each region to explore them further.

Now let’s take a look at the individual samples with larger chromosomal aberrations.

B. Investigating Individual Samples

You can now plot the LRs and segmentation covariates together to investigate these samples of interest.

  • From the heat map created in Figure 4-3 double-click the 2 (Chromosome 2) in the Full Domain View.

Notice the large gain in the left of the plot (Figure 4-4).

Figure 4-4. Large gain on chromosome 2

Figure 4-4. Large gain on chromosome 2

To see which sample this is, zoom in on the Y-axis and click on the green streak.

  • Click and drag on the Y-axis to highlight an area around the green streak to zoom into.
  • Click on the green streak. Notice in the Data Console that this is sample S150.

Let’s add this sample’s LRs to the plot.

  • Select the User Graphs node in the Graph Control Interface.
  • Under the Add Graph tab click the spreadsheet chooser drop down and choose Select Spreadsheet...
  • Select PCA-Corrected Data (Center by Marker assumed) Transposed - Sheet 1 and click OK.
  • Check the box for S150 and click Add.

Now let’s plot the segmentation covariates on top of the LRs as before.

  • Select the S150 graph node (top) in the Graph Control Interface. Under the Add Item tab click the spreadsheet chooser drop down and choose Select Spreadsheet...
  • Select Segmentation Covariates Every Column Transposed and click OK.
Figure 4-5. S150 LRs and segmentation covariates displayed below heat map

Figure 4-5. S150 LRs and segmentation covariates displayed below heat map

  • Check the box for S150 and click Add.
  • Move the second S150 graph item up as before by clicking and dragging it above the first S150 graph item.

The graph should look like Figure 4-5. Notice the shift in the LRs and segmentation covariates for that region.

Congratulations! You have now worked your way through an entire copy number analysis project, no easy task. We wish you the best of luck on your own study. As always, if you have any questions or need help with any portion of this tutorial or your own analysis, please give us a call. We’d be happy to help!