Importing Regions and CNVs

You can import the genomic regions from text, vcf, and tsf files. When imported, the regions or CNVs from these files will be available as a region or CNV table that can use VarSeq‘s annotating and filtering algorithms. To import regions, or CNVs, you must already have variants imported into your project.

Select Files to Import

The import wizard will step you through all of the import options to bring region or CNV data into VarSeq.

The first step is to select the files to import.

Import Wizard Step 1

Select the files to import on the first step of the import wizard.

When importing text files, the next step will be to set the parsing options for the files. If the input files are vcfs or tsfs this data can be inferred from the meta data of the file and the next step is to associate the relationship between the files and samples.

Setting the Parsing Parameters for Text Input

When importing text files, the fields that define the genomic region for each feature must be defined. If you are importing CNVs, the field that contains the type for each CNV event must also be specified.

Text Parsing Parameters

Selecting the text input fields that correspond to the required genomic fields and CNV type field.

Three different formats are supported for the input genomic coordinates:

  • 0-Based Intervals - The chromosomes start at position 0. The size of the features in bp can be calculated as stop-start. This requires a Chr, Start, and Stop field.
  • 1-Based Intervals - The chromosomes start at position 1. The size of the features in bp can be calculated as stop-start+1. This requires a Chr, Start, and Stop field.
  • Position - The genomic coordinates are all in a single field in the file. The field has the form chr:start-stop where the start field is one based so the smallest value is 1 and the size of the features in bp can be calculated as stop-start+1.

Select the fields that correspond to the coordinate system selected by clicking on the headers of the preview table. If you are importing CNVs you will also be required to select the Type field which determines the type of each CNV event.

Note

The CNV type field will be one of the following types based on the case insensitive contents of the type field:

  • Duplication - the text contains “dup”, “gain”, “ins”, “amp”
  • Deletion - the text contains “loss”, “del”
  • Loh - the text contains “loh”

If the field does not have the required text for one the supported events, then the event is assumed to by diploid and it is dropped from the results.

Once the parameters are correctly configured, click the Next button. This will apply the selected settings to each of the input files and scan the resulting field types. If the settings can be applied to all of the files to successfully create the same set of input fields the wizard will move to the sample association step. Otherwise, you will be prompted to provide the input settings for any of the remaining input files.

Associating Inputs with Existing Samples

Each file selected for import must have the input data associated with a sample that already exists in the project. This is done by selecting the file from the dropdown list next to the corresponding sample that already exists in the project. If you are importing a vcf or tsf file which has multiple samples, the sample name from the file will appear after the file name in the dropdown list.

Associating Samples and Input Files

Selecting the file that corresponds to the sample that is already imported into the project.

After all of the samples and input files have been matched click the Next button move to the field behavior selection page.

Setting the Import Field Behavior

By default, all of the fields except for the CNV type are imported and merged for each unique region. The CNV type field (if you have one) will be unique to each associated sample. You can change the merge behavior to change how the field values are combined when there are multiple records at the sample location.

Changing the field merge behavior

Selecting the text input fields that correspond to the required genomic fields and CNV type field.

By default, all of the fields will be merged by creating a Unique list of values for the field across all samples and files, this will keep the field a variant site field. Other merge options include:

  • NumericMax: For integer, integer array, float or float array field types. Takes the maximum of all values for the field in all files.
  • NumericMin: For integer, integer array, float or float array field types. Takes the minimum of all values for the field in all files.
  • NumericMean: For integer, integer array, float or float array field types. Takes the mean of all values for the field in all files.
  • KeepMatching: For all field types. Only keep the value if all files that have a value for the specified field match.
  • TakeFirst: For all field types. Take the first value seen.
  • TakeAll: For all field types. Take all of the field values for all of the merged records and combine them in a list.
  • Sample: For all field types. This will take all of values for this field and make them sample specific based on the sample/input file association.

Fields can also be dropped from the import by unchecking the Select Field checkbox. Unchecked fields will not be read and imported into the project.

Import Summary

The final page of the import wizard is a summary of the import process. To finalize the import click Finished. This will import the text file into VarSeq. Please note that this may be completed in several steps.

Requirements

Requires that samples with variants have been imported.

Output

Creates a new table for the imported CNVs or regions.