Copy Number Analysis

Copy Number Analysis Overview

Copy Number Variation

A normal base pair has two copies, one on each chromosome. (A base pair on the X chromosome in men will normally have only one copy.) Even if the two base pairs are different alleles, there are still considered to be two copies.

However, under certain circumstances, and especially in the case of certain diseases, there may sometimes be a base pair, or even an entire chromosome, that will be replicated more than two times, appear just once, or deleted entirely. The number of copies of a base pair is termed “copy number”, and this variation of the copy number is termed “copy number variation” (CNV).

For both micro-array and array CGH (aCGH) scans, the more copies there are of a base pair, the higher the total intensity will be, irrespective of which alleles may be present, even if the base pair is a polymorphism. Typically, a lot of processing is needed to transform intensity data to a quantile-normalized log base-2 (log2) ratio of intensities of observations versus a reference population. When the intensities of the observations are the same as the reference population median for a given base pair, the log2 ratio will be equal to zero. Amplifications over the reference standard will be significantly greater than zero, and deletions will be significantly less than zero.

Copy Number Analysis Package (CNV Analysis)

Golden Helix SVS supports reading micro-array and aCGH log2 ratio data from the Affymetrix, Agilent, Illumina, and NimbleGen platforms, as well as processing Affymetrix CEL files to generate log2 ratios, with the object of determining where CNVs occur. Log2 ratio data from other platforms may also be prepared and analyzed by Golden Helix SVS by first converting it to the Affymetrix CNT text file format (see Affymetrix CNT File Format). Subsequently, association analysis can be performed on the log2 ratios directly or on related covariates over found CNV regions.

An abbreviated workflow for CNV association analysis is as follows:

  1. Import and/or prepare log2 ratio data.
  2. Perform quality assurance on log2 ratios.
  3. Execute CNAM (Copy Number Analysis Method) optimal segmenting on the “cleaned” log2 ratios.
  4. Create a new covariate spreadsheet consisting of average log2 ratios over found CNV regions for each sample.
  5. Import a spreadsheet with phenotypic data such as case-control status.
  6. Join the log2 ratio covariate spreadsheet with the spreadsheet containing phenotypic data.
  7. Perform association analysis on a phenotype with the covariates from the joined spreadsheet.

Preparing Log2 Ratio Data

The CNAM (Copy Number Analysis Method) optimal segmenting algorithm uses log2 ratio data as input. Therefore, before you can begin copy number analysis, you must import log2 data into an SVS project.

For the Affymetrix 500k, SNP 5.0, and SNP 6.0 arrays, SVS supports reading CEL intensity files and calculating normalized log2 ratios for copy number segmentation and analysis in SVS. For details about reading CEL intensity files, see Affymetrix CEL Files.

For the Affymetrix 10k, 100k, and 500k arrays, you may use the Affymetrix CNAT Batch Analysis tool to create CNT files; or for the 100k, 500k, and SNP 6.0 arrays use Genotyping Console to create CNCHP files. These files contain normalized log2 ratios and can be imported into SVS for analysis. See Affymetrix Files for instructions on creating and importing these files. Once the files have been parsed to extract the log2 ratio values, the data is ready for copy number analysis.

For the Illumina platform, you must use GenomeStudio with the SVS DSF Export Plug-In to export the log2 ratio values from your GenomeStudio project. For instructions on how to install and use the plug-in, see Exporting DSF Data from GenomeStudio using Plugin version 4.0.

For the Agilent platform, you must use the Agilent import menu item and select the correct intensity field for importing the log2 ratio data. See Agilent Files for more information.

For the NimbleGen platform, you must use the NimbleGen Data Summary Files import menu item and select at least one field to import. See NimbleGen Data Summary Files for more information.

You may also create a minimal data file with your own normalized log2 ratio data using the Affymetrix CNT file format. This file must contain log2 ratio data and marker map data. See Affymetrix CNT File Format for details. You can then import the Affymetrix CNT files to create a spreadsheet of log2 ratio data.

Using CNAM (Copy Number Analysis Method) Optimal Segmenting

CNAM Optimal Segmenting represents the second step for performing copy number analysis after importing log2 ratio data into a project and applying an appropriate genetic marker map. See Genetic Marker Maps Overview for more information. CNAM Optimal Segmenting uses both the genetic marker map information and the log2 ratios in the spreadsheet to discover regions of markers in which the log2 ratios vary significantly from segment to segment. While the genome has numerous regions of copy number variation, these regions are approximated by the segments found with the CNAM Optimal Segmenting algorithm. These segments will, with high probability, be where there are regions of copy number loss, neutral or gain in the data.

Upon segmenting, at least two new spreadsheets are created in the current SVS project: the segment means spreadsheet and the covariates spreadsheet. The segment means spreadsheet lists every region computed, its beginning and ending marker, and the segment mean log2 ratio value for every sample within that region. A covariate segment is created for all start and end positions for all samples. Each sample will have exactly the same number of covariates. The value of a sample’s covariates is determined by the segment mean for the segment that the covariate start and end positions are contained in. The covariates spreadsheet can be output in one of two formats, either a column is created for every active marker in the spreadsheet that was segmented, or a column is created for the first marker in every covariate segment. Optionally, a Wiggle file may also be generated which contains the locations of these regions.

Options and other fields within the CNAM Optimal Segmenting tool are described below (see CNAM Optimal Segmenting Window).

cnamWindow

CNAM Optimal Segmenting Window

Log2 Ratio Spreadsheet

In order to use CNAM optimal segmenting on a spreadsheet, a spreadsheet must contain log2 ratios and have a genetic marker map applied. From this spreadsheet, select Numeric > CNAM Optimal Segmenting.

Selecting Chromosomes

For large datasets, it is better to only segment a chromosome at a time or a few chromosomes at a time. As CNAM Optimal Segmenting does not segment across chromosomal boundaries results will not change by subdividing the segmenting by chromosome.

To select a chromosome or a few chromosomes, use the Select > Activate by Chromosomes option then select the chromosomes you wish to segment. It is not necessary to create a subset spreadsheet, as the segmenting algorithm will only run on active numeric columns.

Chromosome Segmenting Options

Variations of the CNAM Optimal Segmenting algorithms for obtaining the regions of CNV are documented in the Formulas and Theories chapter, see CNAM Optimal Segmentation Algorithm. Certain parameters for this algorithm may be changed within these segmenting options.

Algorithm

CNAM offers two types of segmenting methods, univariate and multivariate. These methods are based on the same algorithm, but use different criteria for determining cut-points denoting CNV boundaries.

The multivariate method segments all samples simultaneously, finding general CNV regions which may be similar across all samples. This method is preferable for finding very small CNV regions. For a given sample, the covariate is the mean of the log2 ratios within each segment for that sample. These covariates can then be used for association analysis. This model makes the tenuous assumption that for a given disease, the beginning and end of a CNV region will be similar for subsets of the cases. That is, if the regions are conserved for enough cases it is expected there is sufficient power to find a statistical association. If this assumption holds true, very small CNV regions can be found because the signal will be assessed over multiple samples.

In reality there may not always be consistent CNV regions across multiple samples. The univariate method segments each sample separately, finding the cut-points of each segment for each sample individually and a spreadsheet is created showing all unique cut-points found among all samples. The univariate method discovers the optimal segments for each sample and outputs the mean, for every sample, of every unique segment found across all samples. This output can be displayed in one of two formats ready for subsequent association analysis or for plotting results. The output spreadsheets are discussed in Outputs from CNAM Optimal Segmenting.

Univariate Outlier Removal

The univariate outlier removal option helps to address the influence of large negative or large positive values on determining segment boundaries. It works by excluding found cut-points that bracket single marker segments before running permutation tests to determine the strength of the segment. This option is only valid when the minimum number of markers per segment is set to 1. If outliers are not removed and the minimum number of markers per segment is set to a number greater than one, a single marker outlier could force adjacent markers to create a segment that is driven only by the single outlier. This would inflate the number of segments that had the minimum number of markers allowed, and incorrectly specify boundaries if the number of markers in the region was actually less than the the minimum number of markers allowed in a segment. If the minimum number of markers in a segment was set to one with the univariate outlier removal box not checked then single marker segments would be found, but they would not be deemed significant with permutation testing. As a result, the algorithm looks for fewer segments at the expense of the larger, real segments. See CNAM Optimal Segmentation Algorithm for more details on this option.

Use Moving Window

If “Moving Window” is selected, the segmenting is performed using a moving window of markers that sweeps across each chromosome. Segmentation is constrained to the window, and then the results from each region are combined to produce the whole-chromosome results. This can greatly reduce the run time of the algorithm for large chromosomes, but may also introduce edge effects. CNAM chooses window boundaries in such a way that edge effects are reduced, but still cannot guarantee globally optimal results when using a moving window. The run log contains details on what window boundaries are chosen by CNAM.

Moving window size (markers per window)

The number of consecutive markers analyzed in the moving window. This option is only available if “Use Moving Window” is selected. Smaller moving window sizes speed up the run-time of the algorithm, especially when not using hardware acceleration. Note, however, there is a somewhat higher risk of false discoveries using a moving window approach as there is the potential for anomalies due to looking at a window of data instead of all of it. Permutation testing does minimize this.

Max segments

CNAM uses the following basic approach to segment a chromosome:

1. Identify candidate segments by performing a least-squares fit of segment boundaries to the Log R Ratios.

2. Perform permutation testing to discard segments that don’t vary significantly from their neighbors. Continue discarding until only significant segments remain.

This option determines how many candidate segments CNAM starts with for each chromosome. It can be thought of as an upper limit on the number of segments that CNAM will be able to detect. Larger values make it possible to detect more copy number variations, but also slow down CNAM since more candidate segments must be found and permutation tested.

There are two ways to specify this limit:

1. Per 10,000 markers in a chromosome By default CNAM will look for up to 10 segments per every 10,000 markers in a chromosome. This per 10k limit can be changed to suit your needs. So, if a sample has 40,000 markers in Chromosome 1, CNAM would search for up to 40 CNVs in that chromosome. All Chromosomes will be treated as having at least 10,000 markers when computing this limit.

2. Single Constant Value Alternately, it is also possible to specify a single limit that will be used for all chromosomes (or window if the moving window option is enabled) regardless of size or number of markers.

When performing univariate segmentation, smaller values for “Max Segments” are ideal for detecting large rare variants. Smaller values also help keep the run-time manageable. Large “Max segments” values can detect common smaller variants, but also suffer from increased false-positives due to additional multiple testing. Consider using the Multi-variate algorithm to detect small, common CNVs.

When performing multivariate segmentation, more segments are usually needed to detect smaller, common CNVs. The multivariate algorithm’s performance also scales better with more segments, so increasing this will have less effect on run-time compared to the univariate algorithm.

Min #markers per segment

This constrains the algorithm to only find CNV regions with this minimum number of markers in each segment.

This parameter allows you to prevent finding CNV regions based on short spans of noise. In general the permutation testing should prevent small spurious segments from showing up, but a good default for this parameter is 1 marker with univariate outlier removal on for univariate analysis. For multivariate analysis, a minimum number of 1 marker is still a good default. It is important to take into account any outliers in the log2 ratios for a sample. Outliers can still drive the segmentation results even after permutation testing, although their effect is minimized, to remove their effect use the univariate outlier removal option.

Max pairwise segment p-value

The “Max segments” parameter sets an upper bound on the number of segments found. However, the problem remains to determine the actual number of valid CNV regions in the data. The process used is, once a set of k segments is found, each pairwise set of segments is compared through a permutation testing procedure. If every pair is statistically significant according to the “Max pairwise segment p-value”, then the k-way split is retained. Otherwise, the algorithm continually decreases k by one until every adjacent segment is significantly different from its neighbor or no segments are found, whichever comes first.

Larger p-values increase sensitivity by rejecting fewer segment pairs, but also increase the false-discovery rate. Conversely, smaller p-values decrease the false-discovery rate but also decrease sensitivity. Smaller p-values also require more permutations to accurately test, and can significantly increase the segmentation running-time.

CNAM uses random permutation testing to estimate the p-value for each segment pair. CNAM evaluates \frac{10}{p_{max}} random permutations of the log ratios from the segment pair, where p_{max} is this parameter. Each permutation is checked to see if it has a better split (smaller sum of squared deviations from the means) than the original input segments. If the percentage of random permutations that have a better split is greater than p_{max}, then the pair is rejected as insignificant.

Segment Means Output

These options select which segment means output to generate, see Outputs from CNAM Optimal Segmenting for details.

Log Output

Here you can enable the Full Logging option. This option outputs extra messages that more thoroughly detail CNAM’s activity.

Hardware Options

Several options exist to improve CNAM’s performance on modern computers.

Number of CPU Threads

Both the Univariate and Multivariate algorithms can take advantage of multi-processor or multi-core machines by performing some of their work in parallel threads. It is usually a good idea to match this number to the number of computational cores you have available on your system. The number of cores detected will be displayed to the right of this option.

This option only effects the number of threads on the system CPU, but not on any hardware accelerated devices(such as GPUs). For accelerated devices, CNAM automatically chooses the ideal number of threads. However, some operations (such as permutation testing) do not use hardware acceleration, so this option should still be set correctly even when using hardware acceleration.

Use Hardware Acceleration

CNAM can now take advantage of Graphics Processing Units(GPUs) and other OpenCL compatible devices to speed up segmentation. This option can dramatically improve performance without sacrificing accuracy. To use this option, you will need a device that supports OpenCL, such as a modern graphics card. You will also need up-to-date drivers to ensure full support.

If you have more than one OpenCL capable device, you can use the device drop-down to choose which one you want CNAM to use. Currently, CNAM does not support using multiple OpenCL devices simultaneously. For device details and troubleshooting information, click the OpenCL Info... button.

Note

For Windows Remote Desktop users, most GPUs can not be used when running SVS via Remote Desktop. This is because remote desktop sessions use a special video driver that is incompatible with OpenCL.Hopefully a work-around will be available in the future.

Specify Memory Limit

This option allows users to fine-tune the memory usage for multivariate segmentation. When left unchecked, SVS will estimate a good memory limit based on your current hardware. Specifying this parameter can improve performance on high-end hardware, or improve stability on low-end hardware.

If you are using a GPU, this option limits the amount of video memory CNAM will use. If using a CPU, it will limit the amount of system RAM used. It is a good idea to make this limit smaller than the total amount of memory available in order to leave room for the operating system, device drivers, and other software.

Optional Output Files

On the Optional Output tab, checking the Optional Bookmark File Output box exports the segment means to a UCSC Wiggle Track (WIG) file format file for Genome Browser import. Use the Browse button for file name selection.

If the WIG files are output while using the Univariate segmenting algorithm, the browse button will have you select a directory location as a WIG file will be generated for each sample. These files will be named using the sample name from the file.

Excluding Markers

If desired, markers can be excluded from the segmenting algorithm and its results by inactivating the columns corresponding to those markers.

Run Log

A log can be viewed during the computation by clicking on the Run Log tab after selecting the desired options and before running CNAM. The log informs you of the progress in segmenting sub-regions of the total region of markers being analyzed. If the number of segments found in a given window is equal to the maximum number of segments per window, a warning message will be printed in red, suggesting the user consider increasing that parameter.

Note

During processing, a normal progress bar is also shown in a separate window.

Outputs from CNAM Optimal Segmenting

CNV Covariates Spreadsheet

The first one (or two) spreadsheet(s) created upon segmenting is a covariates spreadsheet(s). This spreadsheet contains the average log2 ratio value for each sample within each segment of markers. In this spreadsheet, the rows correspond to the samples, the columns correspond to the overall segments which have been determined, and the data are the average log2 ratio values. The spreadsheet created also has the original marker map applied. There are two covariate spreadsheet variants, one or both can be output from the segmenting results and are described below.

First column of each segment

In this variant, each column is identified by the chromosome number and the beginning marker of each segment (see Segment Covariates Output for First Marker of Segment Only). The markers are identified by chromosome position. A new column is created every time there is a new cut-point over all of the samples. This creates common segments for all samples, although for a particular sample there may be more columns than there are cut-points. In the case where a new column is introduced but a cut-point was not found for that sample, the CNV segment mean is repeated for all columns in a segment. A new segment mean only occurs for each new segment for each sample. This spreadsheet is ideal for association testing as the Bonferroni multiple testing correction is reduced.

segCov1stCol

Segment Covariates Output for First Marker of Segment Only

segCovEvCol

Segment Covariates Output for Every Marker

One column per marker

In this variant, each column is identified by the marker name (see Segment Covariates Output for Every Marker). A new column is created for every marker present in the original segmented spreadsheet. The mean segment value is repeated for every marker in each segment found, and only changes when cut-points are reached. This spreadsheet is ideal to use for plotting as there is a value for every marker, better demonstrating copy number amplifications and deletions.

CNV Segment List Spreadsheet

The third spreadsheet created upon segmenting is a list of CNV segments means (see Segment List Spreadsheet). This spreadsheet contains columns for the chromosome name, segment base start position, segment base end position, segment mean, the number of markers in the segment, the segment column start index, and the segment column end index. Spreadsheet rows correspond to segments for each sample. If a sample has 1,000 segments, then there will be 1,000 rows for the sample before the next sample starts.

segListSS

Segment List Spreadsheet

Segment Run Log

The details of the segment run log displayed while segmenting, is saved and output in the Segment Run Log for future reference (see Segment Run Log). This log details if the maximum number of segments found was reached and if the window size was doubled due to avoid potential edge issues with the segmenting algorithm.

segRunLog

Segment Run Log

Wiggle Track (WIG) File

If you chose to output WIG files they will be saved in the directory you selected.

Manipulating and Analyzing CNAM Output

Several options for further investigating CNAM output, such as the Segment List spreadsheet, are located in the CNV sub-menu of the Analysis menu in the spreadsheet toolbar. The available functions are outlined below.

Discretize CN Segment Covariates with Counts

Discretizes the Segment Covariates based on a two- or three-state copy number model, specified by the user.

If the three-state model is selected, two thresholds must be specified. The values in the covariate spreadsheet are replaced by a -1 if the segment mean is below the lower threshold, a 0 if it is between two thresholds, and a 1 if it is above the upper threshold.

If a two-state model is selected, one threshold must be specified. The two state models include a copy number loss model and a copy number gain model. If the loss model is selected, a value less than the threshold is indicated with a 1 and values above the threshold are indicated with a 0. If the gain model is selected, a value greater than the threshold is indicated with a 1 and values below the threshold are indicated with a 0.

If there is a missing value then it will still be indicated as missing.

A second spreadsheet will also be created reporting the number of copy number loss, neutral and gain values for each marker in the segmentation covariates spreadsheet. Also included in the output are the mean value for the marker and the absolute difference between the threshold values and the mean marker value. If a two-state model is used the appropriate count column will contain only zeros and the absolute difference column not used will be filled with missing values.

Count Number of Segments per Sample

This function can be used on the Segment List spreadsheet that is output from CNAM Optimal Segmenting or the Homozygous Runs... spreadsheet that is output from the Runs of Homozygosity tools.

This function will count the number of segments or runs found in each sample and will provide the total combined length for each sample. The count and length information is output as a new child node spreadsheet of the Segment List or Homozygous Runs spreadsheets.

Sample Statistics for Discretized Segment List

Summary statistics are calculated for the Discretized Segment List spreadsheet, created by Discretize Segment List function. Summary measures are output as separate spreadsheets containing summary statistics by Chromosome and are as follows:

  • Loss Count by Chromosome
  • Loss Minimum Segment Length by Chromosome
  • Loss Maximum Segment Length by Chromosome
  • Loss Mean Segment Length by Chromosome
  • Gain Count by Chromosome
  • Gain Minimum Segment Length by Chromosome
  • Gain Maximum Segment Length by Chromosome
  • Gain Mean Segment Length by Chromosome

Each of the measures listed above are output in separate spreadsheets with samples as rows and summaries for each chromosome columnwise. All of the above measures are also calculated over all chromosomes and contained in a separate output spreadsheet, All Chromosome Segment Statistics.

Note

All lengths are reported in units of kilo base pairs (kB).

Discretize CN Segment List

Discretizes the Segment List based on a two- or three-state copy number model, specified by the user.

If the three-state model is selected, two thresholds must be specified. The values in the covariate spreadsheet are replaced by a -1 if the segment mean is below the lower threshold, a 0 if it is between two thresholds, and a 1 if it is above the upper threshold.

If a two-state model is selected, one threshold must be specified. The two state models include a copy number loss model and a copy number gain model. If the loss model is selected, a value less than the threshold is indicated with a 1 and values above the threshold are indicated with a 0. If the gain model is selected, a value greater than the threshold is indicated with a 1 and values below the threshold are indicated with a 0.

Missing values will still be indicated as missing.

Create Sparse Segment Matrix

Converts either the segment list spreadsheet or the ROH runs of homozygosity matrix into a sparse matrix with one column per segment per sample.

To create unique column names, the sample name, chromosome name and start position are concatenated together as this combination is assumed to be unique. A single column is created for each row in the segment list or ROH runs of homozygosity spreadsheet. All values in the column are missing (0’s for ROH sparse matrix) except for the particular sample that has a segment mean (1’s for ROH sparse matrix) for that segment.

In the marker map created for the spreadsheet, both the Position and Stop position (“End Position” in the list spreadsheets) are included. This enables GenomeBrowse to plot the values of the sparse matrix as intervals in the heat map.

Visualizing Copy Number Analysis Results

There are several ways to visualize copy number analysis results. It all depends on what results you want to visualize. Below are several different ways to visualize copy number data. These ways are not exhaustive, but are indicative of the typical ways CNV data is viewed.

Log2 Ratios

If the quantile-normalized intensities of the log2 ratio data are to be plotted for visual inspection for the presence of CNVs, this can be done from the log2 ratio spreadsheet. From the log2 ratio spreadsheet, go to GenomeBrowse > Numeric Value Plot for row marker mapped spreadsheets, or GenomeBrowse > New Window and click on the Add button in the middle of the window to add samples in either case.

See Numeric Value Plot for more information.

CNV Segment Mean Covariates

If the segment mean covariates of the log2 ratio data is to be plotted for visually inspecting CNVs, this can be done from the CNV Segment Covariates spreadsheet (preferably the spreadsheet with a column for every marker). To plot the data, go to GenomeBrowse > Numeric Value Plot for row marker mapped spreadsheets, or GenomeBrowse > New Window and click on the Add button in the middle of the window to add samples in either case. See Numeric Value Plot for more information.

A heat map of segmentation covariates is a good tool for visually detecting interesting copy number regions. This can be done with either the segment mean covariates or the discretized covariates. It is useful to first sort the samples by case/control status in order to easily inspect the top and bottom halves of the heat map. To do this, merge the CNV Segment Covariates spreadsheet with a phenotype spreadsheet containing case/control status for the samples, then sort on the column containing case/control status. From this spreadsheet, select GenomeBrowse > Heat Map. See genomicHeatMap for more information.

CNV Segment Means Histogram

Plotting the histogram of segment means can be useful in visually identifying thresholds between copy number states of loss, neutral and gain for a dataset. To plot the histogram, select Plot > Histograms from the CNV Segment List Spreadsheet, and from the Histogram Parameters dialog select the “Segment Mean” column. Additional parameters can be changed from their defaults. See Histograms for more information.

CNV Segment Counts Histogram

Plotting the histogram of segment counts can also be useful in visually identifying noisy samples with a large number of segments. To plot the histogram, select Plot > Histograms from the CNV Segment Counts Spreadsheet, and from the Histogram Parameters dialog select the “Segment Counts” column. Additional parameters can be changed from their defaults. See Histograms for more information.

Log2 Ratios and CNV Segments Together

It can be a useful visual tool to plot CNV segments on top of original log2 ratio data. To do this, make sure a marker map is applied to both the original log2 ratio transposed spreadsheet and the CNV segments transposed spreadsheet.

From the log2 ratio spreadsheet, plot the sample or samples of interest by going to GenomeBrowse > Numeric Value Plot for row marker mapped spreadsheets, or GenomeBrowse > New Window and click on the Add button in the middle of the window to add samples in either case. See Numeric Value Plot for more information.

Next, select the first log2 ratio graph node and right-click to select Add Item(s) tab in the plot tree. Click on the Project location In the node selection list select the CNV segments spreadsheet. Once the samples have been loaded into the plot data panel, select the same sample for the graph that had the log2 ratio values plotted.

In general, it is usually desired to have the CNV segments be plotted as a step line, and to leave the log2 ratio values as points on the plot. To change the options for the segment mean covariate log2 ratios, select the appropriate item under the log2 ratio plot in the Plot Tree panel on the left-hand side of the window. On the Display tab change the line style to Left Step. The line can be brought to the front of the viewer by clicking on the name of the item and dragging it above the first item in the log2 ratio plot container. The items can be renamed by right-clicking on the name and selecting “Edit Title”.

More graphs can be added for additional samples by selecting “Add” in the tool bar or right clicking in the Plot Tree. Again select the Project location and selecting both occurrences of another sample from both the log2 ratio spreadsheet and the segmentation covariate results spreadsheet. The above procedure can be repeated for as many samples as desired. See GenomeBrowse: The Genomic Scale Data Visualization Tool for more information.