Frequently Asked Questions¶
Can you walk me through installing and registering SVS 8.4.0 on my Windows machine?¶
Download the SVS executable (ex. SVS-Win64-8.4.0.exe) to your machine from the link provided and double-click on the executable to begin the process.
- Click Next > on the first dialog (Figure 1-1).
- Select the SVS install directory. The setup wizard will create a folder in this directory called Golden Helix SVS and the default location is C:\Program Files\. The path and the folder can be changed to the users preference. Then click Next >. (Figure 1-2)
- The third dialog allows you the option to create a Start Menu folder for SVS, select your options and click Next >. (Figure 1-3)
- The fourth dialog allows you to create Quick Launch and Desktop icons, select your options and click Next > (Figure 1-4).
- The fifth dialog is a summary of the options selected on the previous dialogs, click Install to finish the install process or click Back > to make any necessary changes to the selected options.
- Check Launch SVS on the last install dialog and click Finish.
Login and Register Dialog¶
If you are opening SVS for the first time on a machine (or for the first time period) the first dialog you will see on launching SVS is the dialog to login or register your SVS license. You may also see this dialog if you choose not to have your credentials saved or have previously logged out of SVS.
If you do not have an existing Golden Helix account, click on the Register tab and fill out the registration form. Once the required fields have been filled in and the license agreement has been accepted, the Register & Log In button will become active. You can optionally uncheck the “Stay logged in” option to have SVS not store credentials locally and require logging in again after relaunching the software. See Figure 1-5.
The Golden Helix account will also be the credentials used to log into the answers.goldenhelix.com community site.
If you have an existing Golden Helix account with a valid SVS license, enter in your email address and password. You can optionally uncheck the “Stay logged in” option to have SVS not store credentials locally and require logging in again after relaunching the software. After the account information is filled in, click Log In. See Figure 1-6.
Activate a License Key¶
Once logged in, if it is the first time logging in on a particular computer, you will need to activate a license key. Either click on Activate License Key in the lower right hand corner of the Welcome Screen, or go to Help > Activate a SVS License Key. Type or paste in the provided license key and press Verify. Once the key has been verified, if there are available activations the Activate button will become active. To activate VarSeq for the particular user account and machine click Activate.
SVS can be used with limited capabilities without an active license key. This is called “Viewer Mode”.
Where are the downloaded genome assemblies and annotation data saved?¶
All user downloaded content is saved in the AppData directory on your computer by default. You can find a listing of the paths from the Welcome Screen of SVS before opening a project by going to Tools > Global Product Options. The paths will be listed under the Applications options list and then in the Custom Paths section.
Any of these paths can be changed by selecting the Browse button and navigating to the preferred directory. For example since annotations data sources can take up a significant amount of space it is acceptable to move the tracks to an external or network drive so they are not using up the available memory on your C drive.
To view the downloaded content in these locations you can go to Tools > Open Folder and select the corresponding folder.
I have a lot of RAM installed on my computer how can SVS take advantage of that?¶
SVS has two settings that can allow users to maximize the amount of RAM that can be accessed by SVS. From the SVS Welcome Screen before opening a project go to Tools > Global Product Options and then under Memory Usage on the Applications options list.
Dataset memory cache limit - This will most noticeably impact the performance of opening and navigating spreadsheets in SVS. We use this number to keep as much of the dataset in memory as possible. If it’s large enough to encompass the entire dataset then most plotting and analysis operations will see massive improvements on their second and subsequent passes over the data (the first pass brings it into the cache). This setting is a soft memory limit for ALL the datasets in a project though so it will dynamically discard the least recently accessed chunks (columns) of datasets as you navigate around your project looking at different spreadsheets.
Transpose and Analysis Memory Usage - This is a soft threshold for how much memory we should take advantage of when doing a dataset transpose operation or other memory-intensive analysis operations. The only analysis operations that utilize this are ones that are in essense doing an in-place transpose such as the CNV segmentation algorithm that segments samples (rows) while the data is stored on disk in columns (LogR values at a single genomic position). Some import operations that are also doing an internal transpose such as Affymetrix CEL import use this threshold. Because transpose operations are essentially about reading data from disk in an inefficient manner, the more memory utilized, the fewer disk accesses required and hence the faster the operation. If the whole dataset can be held in the provided limit, the transpose operation will be optimally fast.
We recommend setting these limits to no more than 50% of the total available RAM on the machine because these thresholds can and will not be representative of the total amount of memory SVS will naturally consume.
I have downloaded an add-on script how can I use it?¶
Once you have downloaded the script to your computer from our Scripts Repository you will need to save it in your SVS User Scripts location which can be found by going to Tools > Open Folder > User Scripts Folder.
The user scripts directory is arranged to mirror the SVS menu structure so that scripts can be accessed similarly to tools that are already available within the software. So any script that needs to be run from a spreadsheet will be saved in the corresponding Spreadsheet folder, any script that needs to be run from the SVS Project Navigator will be saved in the corresponding SVS folder. All scripts available from our website will list the recommended directory location on both the website as well as in the PDF documentation that is included, below is two specific examples.
This script needs to be run from a spreadsheet with numeric values so it will be saved in the /Spreadsheet/Numeric/ folder.
How can I back-up my SVS project or recover data from a corrupt project?¶
All data from an SVS project is stored in the project folder. To find the save location go to Tools > Open Folder > Project Folder. This location contains the project file (Project_Name.ghp) as well as several folders (coverage, Data, genomes, map, tmp, etc.) that contain the actual data that is inside of the project.
The easiest way to back-up an SVS project is by making a copy of the project through the SVS menu options. From the open project in SVS go to File > Save a Copy of Project, this will prompt you for the save location and the name of the copy.
If you are unable to open an SVS project there are a couple of options to recover the data. Open the project folder Tools > Open Folder > Project Folder.
When a project is opened in SVS a temporary project file Project_Name.ghp.tmp will be created in the same location as the original file. You can try renaming this file by removing the .tmp extension and then opening the renamed file.
All of the project data (except plots) is stored in the /data/ and /map/ folders at this location. Any spreadsheet data will be in the /data/ folder in DSF format, all marker map files will be in the /map/ folder in DSM format.
The DSF files can be dragged and dropped into a new project to recreate the spreadsheets. See Golden Helix DSF File.
The DSM files will need to be moved to your Marker Maps folders (Tools > Open Folder > Marker Maps Folder) and then reapplied to each spreadsheet (File > Apply Genetic Marker Map).
In using the raw data stored in the project folder to recreate your project all spreadsheet order and child/parent relationships between the spreadsheets will be lost as each DSF file will create a Top-Level spreadsheet node in the new project in the order in which they are imported.
How can I activate my variants based on regions defined in a BED file?¶
GeneName TranscriptName Chr StartPosition EndPosition Strand ATAD3B NM_031921 1 1407164 1431582 + NADK NM_001198995 1 1682671 1690081 - PODN NM_001199080 1 53527724 53551174 + PODN NM_001199081 1 53527885 53551174 + CSF1R NM_005211 5 149432854 149492935 - NKX2-5 NM_001166175 5 172659107 172662315 - BDNF NM_170733 11 27676442 27723180 - DZANK1 NM_001099407 20 18364011 18447829 - TLR8 NM_138636 X 12924739 12941288 +
SVS can annotate or filter directly from the BED file. From your variant spreadsheet go to DNA-Seq > Annotate and Filter Variants then Add the BED file (GeneRegion) into the Select Source dialog then click Next >.
Then on the options dialog select to filter variants not in overlapping regions and select your options for the information you would like included in the annotation output spreadsheet.
In the original spreadsheet only those variants in the defined regions will be left active and a subset spreadsheet will be create for those active variants.
How can I add Gene Name or RS ID to my spreadsheet’s marker map?¶
We have an add-on script Add Annotation Data to Marker Map that can do this for you.
- First download the script from the scripts webpage and save it in the recommended location in your User Scripts folder. See I have downloaded an add-on script how can I use it? for assistance.
Then you will need to download a local copy of the annotation track that contains the information you want added to your map, Tools > Manage Data Sources select the track from our Public Annotations repository and clicking Download in the lower left corner.
- For adding gene name any of our gene annotation sources can be used, for example if you have human data from the GRCh_37_g1k build then Ensembl Genes 75, Ensembl can be used to add Ensembl gene names to an existing marker map.
For adding RS IDs any of the dbSNP annotation sources can be used, for example dbSNP 138, NCBI.
Now launch the script from your spreadsheet and select the downloaded annotation source. This tool will create an augmented marker map in your marker maps folder so you can give this update an informative name and click Next >
- Choose the field in the track that contains the required information. For RS ID the Identifier field from dbSNP should be used and for gene name the corresponding name field should be used.
This script can only add one field at a time to the marker map, so if you would like to add additional fields from the track you will need to repeat this process for each subsequent field.
I get a warning about duplicate markers when trying to append two datasets together, how can I inactive these duplicates?¶
We have a script Inactivate Duplicate Column Headers that can solve this issue.
- First download the script from the webpage and save it in the recommended location in your User Scripts folder. See I have downloaded an add-on script how can I use it? for assistance.
- Then from each spreadsheet you will be appending run the script to inactivate the duplicates. The script will keep the first appearance of the duplicate and inactivate each subsequent appearance of that column header.
- Once the duplicates are inactive you should be able to run the Append Spreadsheet function without error.
I need to append reference samples to my data to perform PCA but my marker labels are different in each spreadsheet.¶
When joining datasets in SVS with either the Append Spreadsheets or Join or Merge Spreadsheets functions you are required to have matching row labels and column headers for that information you would like combined between the two sets of data.
The reference dataset uses RS IDs for marker names and the study population has the same information available in the marker map.
We will rename the study markers to be RS IDs so that the two sets can be correctly joined to together. Please see How can I add Gene Name or RS ID to my spreadsheet’s marker map? if your data does not currently have RS IDs included as a marker map field.
From the study spreadsheet go to Edit > Recode > Rename Marker Mapped Labels and select the RS ID field from the marker map.
A new spreadsheet is then created with RS IDs as the marker names which can now be joined to your reference samples.
How can I import Illumina data for analysis in SVS?¶
Illumina provides data in several formats, raw intensity data files (.idat), Final Report text files and variant call data in VCF format to name a few.
Intensity Data Files (idat)¶
SVS does not support importing idat files directly, the data must first be processed through Illumina’s GenomeStudio or BeadStudio software. Once the data is imported into one of these programs the data can then be exported into a format that SVS can accept.
Supported export formats are either the Illumina Final Report format or in Golden Helix DSF format using the Plug-ins we have available.
Illumina Final Report Files¶
Illumina Final Report files are delimited text files that can come with a variety of information, including genotypes for several strands, Log R Ratios, B Allele Frequencies, GC Scores and mapping information. For import into SVS at minimum the file must contain SNP and Sample columns and one other information field to be imported. The files can come in one sample per file or multiple samples in the same file.
The header data (lines between [Header] and [Data] in the above screenshot) are not required and will not be included in the standard import. The column header line (starts with “SNP Name”) is required to correctly identify which columns contain the required information to build the SVS spreadsheet and correctly match up corresponding alleles to form the genotypes.
If your data comes in one large Final Report file then you will want to go to Import > Illumina > Import Single Illumina Final Report and follow the prompts to import the data. If your data comes in several files, either one sample per file or several unique samples per file, then you will want to go to Import > Illumina > Illumina Final Report to import the data.
Please see Illumina for further details.
If you have data in a similar format but without the column header information you can use our Import Tall Skinny Format script to import the data. This script is restricted to one file at a time for import, so if you have multiple files it may be easiest to add in the column header line to each file so the Illumina Final Report tool can be used.
Can I analyze my sequencing data with SVS?¶
DNA Sequence Data¶
SVS can accept BAM files for visualization of aligned sequence data and can perform analysis on variant call data provided in VCF format. If you only have the raw sequence data (FASTQ) available for your samples you will need to have a Secondary DNA-Seq pipeline perform alignment and variant calling before SVS can be used for analysis.
We have several blog posts available that discuss NGS Analysis including the tools and formats that are available to process your data for Tertiary Analysis with SVS.
RNA Sequence Data¶
Similarly to DNA-Seq data, SVS can accept BAM files for visualization of RNA-Seq data but requires gene (or isoform) count data to perform further analysis, for example if performing differential expression analysis using the DESeq Analysis. Count data is generally provided by a Secondary RNA-Seq pipeline in some form of delimited text.
This type of data can easily be imported into SVS using our standard Import > Text or Import > Third Party tools, once the data is imported the genomic mapping information (chromosome, start position and stop position) must be converted to a genetic marker map and applied to the count data. You can find an example workflow for creating and applying genetic marker maps in the Marker Map Tutorial available on our website.
However the easiest way to import count data is if the data is formated based on the requirements of our RNA-Seq Tabularized Quantification import tool. The importer will automatically import and correctly format all of the data including the marker map information so it is directly available for analysis.
If your count data was provided by Cufflinks in one delimited text file per sample then we have an add-on script available that can convert this output to our Tabularized Quantification format. Please email Golden Helix Support if you need access to the script.