2. Importing Variant and Alignment DataΒΆ

Important

The starter project provided in this tutorial already contains the variants and coverage data for 48 samples. In this portion of the tutorial we will show you how the import of the VCF variant data was completed and how the coverage data was computed on the BAM files so you can also follow along using your own data instead of using the provided project.

If you are already familiar with this process or will be working with the project provided for this tutorial please skip to the Running the CNV Caller section of the tutorial.

As metioned earlier, The VS-CNV algorithm uses changes in coverage relative to a collection of reference samples as evidence of CNV events. To create a set reference samples to be used as a basis for CNV calling, users can compute coverage on BAM files using the Reference Sample Manager.

  • Open VarSeq and click Tools > Manage Reference Samples. This menu computes coverage on BAM files and subsequently adds CNV Reference samples to the reference sample library.
Reference Importer Dialog

Figure 2-1. Opening the Reference Sample Manager

  • Click on the Add References button and select Add Files on the first screen of the Add CNV References to add sample BAMs.
Add Sample BAMs

Figure 2-2. Adding BAM files

  • Ensure that Target Region is selected. Next click on Select Track to browse to the interval track (BED file) that defines the regions that coverage will be calculated over. Note users can import their own BED files using the Convert Wizard. Once an interval track has been selected, ckick Create to create a set reference samples to be used as a basis for CNV calling.
Add Sample BAMs

Figure 2-3. Selecting Interval Track

Now that you have added samples to the reference sample set. You can create a VarSeq Project and import samples to call CNVs on.

  • Open VarSeq and click Create New Project. Select either the Empty Project option. Select your genome assembly and a name for the project and click OK.
Create New Project Dialog

Figure 2-4. Create New Project Dialog

  • Click on the Import Variants button and select Add Files on the first screen of the Import Variants Wizard.
  • Navigate to the directory where your VCF files are saved and select them for import. See Figure 2-5 and then click Next >.

Note

If you do not use the Manage Reference Samples option to import your reference samples as mentioned above, you will need to import enough samples to build your Reference Panel. 30 samples is the minimum number of recommended reference samples. Therefore, you will want to import at least 31 samples, 30 used for reference and an additional sample for analysis.

Once the 31 samples are processed through the CNV tool, VarSeq will save the coverage profile for these samples in the Coverage Reference Samples folder found in the VarSeq User Data location on your computer (Tools > Open Folder > Reference Samples Folder).

For any subsequent run of the algorithm you can import any number of samples for analysis and VarSeq will pull a reference set of samples from those available in the Reference Sample Folder.

Select VCF files for Import

Figure 2-5. Select VCF files for Import

  • If importing into an Empty Project you will need to select your Sample Relationships on the next dialog, for this tutorial the Cancer Samples option was selected. Click Next >.
Select Sample Relationships

Figure 2-6. Select Sample Relationships

On the next dialog we will be associating the BAM files with the imported VCF files so that Targeted Region Coverage can be computed.

  • Click Associate BAM File at the top of the dialog and navigate to the directory where your BAM files are stored. If your BAM files have a similar naming convention to what is listed in the VCF file then they should be automatically associated, if not then manually select each BAM file. Click OK once done.
Associate BAM Files

Figure 2-7. Associate BAM Files to VCFs

The BAM file paths should now be filled out for each sample on the import dialog.

BAM Files Associated in Dialog

Figure 2-8. BAM Files Associated in Dialog

  • Click Next > and Finish to complete the VCF variant data import.
VarSeq Variant Table

Figure 2-8. VarSeq Variant Table

Now to compute the coverage calculations required to detect CNVs.

  • Go to Add > Secondary Tables > Add Coverage Regions and follow directions on the new window to select the interval track that defines the regions to calculate target coverage.
VarSeq Variant Table

Figure 2-9. Adding Coverage Tables.

In the next dialog you will need to select either a BED file or Interval Annotation file that defines the targeted regions in your samples.

  • Click Select Track and navigate to the location of your target region file.

Note

BED files are required to be indexed, if your file does not already have an index (TBI file) it can be computed through the Data Source Library by right-clicking on the file and selecting Computations on Source.

Computing Coverage on BAMs

Figure 2-10. Computing Coverage on BAMs

Once this computation finishes you are ready to begin CNV calling.