Release Notes

8.7.2

New Features

  • Enhancements where made to the GBLUP analysis using some of the techniques of GCTA. Most significantly of these is the ability to correct for Gene by Environment interactions. Also, the kinship matrices computed by GBLUP can be broken out by gender when using the AI REML method. See Genomic Best Linear Unbiased Predictors Analysis for more details.
  • New in the spreadsheet Genotype menu is Genetic Correlation of Two Traits using GBLUP. This performs a bivariate REML analysis on two selected traits to estimate the genetic variance of each trait and the genetic covariance between two traits that can be captured by all SNPs.
  • We have added LD Score Regression to the the Genotype to determine whether inflated Lambda factor statistics from your genotype association test are due to underlying population structure in your data or rather a polygenic architecture to your trait.

Polishes

  • When annotating a variant spreadsheet with a transcripts (a gene annotation source), you can now request that the amino acid notation use single letter notation instead of three letter notation.
  • The Import Third Party Formats wizard will now import the SAS Data File files that are created when using the Save as Third Party Format exporter.
  • The new Create Labels from Marker Map Field from the Spreadsheet > Edit Menu makes it easy to change your labels to RSIDs that are in your marker map.

Bugs Fixed

  • When importing VCF files, rows that had a ALT field of <CNV> and END value defined were being interpreted as GVCF span fields instead of being ignored on import.
  • Public data sets with a “+” in their name were not downloading from the Download > Public Data menu.

8.7.1

New Features

  • The Genotype Imputation method has been updated to optionally utilize the Pedigree data provided by the input spreadsheet during phasing and imputation. This ensures more accurate haplotype matches for related samples and that output genotypes are free of Mendelian errors.
  • The phasing 4.1 algorithm has also had some optimizations applied to provide a modest performance improvement to the algorithm.
  • The PhoRank algorithm has been updated extensively to improve the ranking of genes relevant to sample phenotypes. Also, input phenotypes have been extended to include OMIM provided syndromes and phenotypes. The OMIM content add-on is required for this feature. (See PhoRank Gene Ranking for further details.
  • A number of updates and improvements have been made to the Annotate Variant Effect on Transcripts annotation algorithm that provides per-transcript variant interactions.
    • New fields in the Variant Interaction Report provide the reference and alternate Amino Acid for each transcript in the interactions table. Additionally the number of exons and amino acid position are provided.
    • The HGVS p. field was updated for synonymous and frameshift variants to be the long form. For synonymous variants, this includes the reference amino acid. For frameshift variants this includes the amino acid in the altered sequence.
    • Intronic variants now have their nearest exon reported in the Exon Number field, and a new field 5’ Exon Number can be used to determine which two exons the intronic variant is between.

Polishes

  • A number of updates have been made to the Genotype Imputation tool to improve usability.
    • Added in format checks to verify that spreadsheets contain marker mapped genotype data before the tools can be launched.
    • Allows the option to create reference panel as a spreadsheet for those users who want to fill in missing genotypes or correct genotyping errors in their target data.
    • When running imputation, an addition option allows all input markers to be passed through to the output, even when matching markers in the reference panel are not found.
  • Recode SNPs to Variants now allows the RSID to be specified by the column header and not just marker map fields.

8.7.0

New Product Add-Ons for SVS Server

  • Genotype Imputation with an adaptation of the BEAGLE 4.1 program is now available with a SVS Server license. See Genotype Imputation for more details.

Note

If you are interested in adding the new Genotype Imputation product to your SVS server license, please contact your account manager or info@goldenhelix.com.

New Features

  • Added PhoRank algorithm to rank genes based on their relevance to user-specified phenotypes. See PhoRank Gene Ranking for more information.
  • Added the ability to recode AB genotypes to AGCT genotypes using the new Recode SNPs to Variants tool. See Recode SNPs as Variants for more information.
  • Premade human reference panels can now be downloaded from the Golden Helix server through Download > Imputation Data. These sources are filtered versions of phased genotype data provided by BEAGLE and are designed to be used with the new Genotype Imputation tool.

Bugs Fixed

  • Fixed issue in Convert GFF3 Files to Annotation Track script that caused multiple transcripts in a single gene to be collapsed into a single record.
  • Fixed subset spreadsheet output for Filter Samples by Call Rate tool that occurred when original genotype spreadsheet was sorted.
  • Fixed the crash that occurred for Genotype Statistics by Marker and Genotype Filtering by Marker when classifying alleles based on Reference and Alternate.
  • Numeric Value Plot item in the spreadsheets GenomeBrowse menu now plots the selected value field instead of just the first column in the spreadsheet.
  • K-Fold Cross Validation using the GBLUP method will now run with a binary trait when selecting more than two iterations.
  • Fixed error with Find de Novo Candidate Variants script that occurred when unchecking the option to treat missing genotypes as reference and multiple families existed in dataset.
  • Updated Import VCFs and Variant Files to better report half-called genotypes with missing alleles.
  • Save as Image functionality now recognizes manual y-range changes when a GenomeBrowse plot is first created.
  • Fixed import of family samples in the Import VCFs and Variant Files tool.
  • Fixed error with Import Impute2 GWAS Files when importing multiple files from the same chromosome and including INFO files.
  • Fixed a packaging issue on Mac that prevented CNAM Optimal Segmenting from being able to run in OpenCL accelerated mode.

Polishes

  • Update export to Variant Call Format (VCF) script to handle half-called genotypes.
  • Random Effects Components are now included in the output for each fold as a part of the estimates by sample when running K-fold Cross Validation.
  • Annotation sources created using the Python API now create files compatible with older product versions.
  • Updated Import VCFs and Variant Files tool to include import of VCF files without sample level information.
  • When exporting data as a VCF file using the File > Save As...> Variant Call Format (VCF)tool, you can now select the alternate field in the marker map to be used to create the VCF file.

8.6.0

New Features

  • New Secure Annotation sources available:
    • CADD variant scores. (See CADD for more information)
    • OMIM Genes, Phenotypes and Variants. (See OMIM for more information)
    • MedGenomes’s Oncology Mutation Database (OncoMD). (See MedGenome OncoMD for more information)
  • The previous Annotate and Filter Variants as well as Variant Classification methods have been replaced by a new wizard under DNA-Seq > Annotate and Filter Variants. (See Annotate and Filter Variants for more information).
  • Added File > Add Alternates to Marker Map to allow user to add an alternate allele field to the marker map of the current spreadsheet. This field is required for the new variant annotation tool.

Bugs Fixed

  • Fixed issue with Mixed Linear Model Analysis and Mixed Linear Model Analysis with Interactions when adding genotypes as covariates to the model.
  • Included variant collapsing transform to Import > Import VCF and Variant Files tool to fix issue with duplicate column entries.

Polishes

  • Renamed GenomeBrowse button and menu item to add plots to “Plot”.

  • Made several usability changes to the Import > Import VCFs and Variant Files tool including the following:

    • The annotation source selected by default for the region subset option is set to the current RefSeq Genes source for the specified project assembly. The dialog will also display source name with the path to the source listed as the tool tip.
    • The import algorithm will now recognize <*> in gVCF files as a spanning alternate allele that can be used to fill in reference calls when importing multiple samples together.
    • Updated sample level fields naming for family samples.

Note

If you are interested in using the new Secure Annotation sources, contact Golden Helix support or your account representative directly. These are enabled as add-on features to your license.

8.5.0

New Features

  • The GBLUP and K-Fold cross validation methods have been extended with a “Large-N” mode allowing them to scale to samples sizes (over 8,000) that would have previously been prohibited due to the exponential scaling of the memory requirements of the exact method. See Large Kinship Matrices or Large Numbers of Samples.

Bugs Fixed

  • Fixed the human GRCh_38 assembly so each chromosome had the correct length specified.
  • Fixed issues with computing lengths of chromosomes when converting FASTA data to create a new genome assembly.
  • Fixed haplotype block spreadsheet selection dialog logic to only activate haplotype block spreadsheets.
  • Fixed bug that caused crash when plotting numeric plots in GenomeBrowse with inactivated columns.
  • VCF export now correctly handling insertions and deletions in scaffold chromosomes.
  • Because K-fold Cross Validation for Genomic Prediction changes the row-state as it creates row subset, it now automatically subsets a spreadsheet with inactivated rows.
  • Fixed issue with manually specifying and saving affection status during VCF import on Mac OSX.

Polishes

  • Added deprecated flags to the Data Source Library so older annotation sources could be hidden automatically when the current box is checked.
  • Added parameter to get the amount of RAM allocated as the working memory size, ghi.const.MemoryWorkSize.
  • Improved speed of adding a large number of files to the VCF importer.
  • Changed the button to add items to the GenomeBrowse plot view to “Plot”

8.4.4

Bugs Fixed

  • Fixed issue with email verification when name has unicode characters.
  • Fixed bug that caused Annotate and Filter to annotate without list fields in tracks
  • Fixed issue with Count Variants per Gene tool caused by variants listed inside non-standard chromosomes.

Polishes

  • Updated output for Haplotype Association Tests and Haplotype Trend Regression to include the markers used for each test.
  • Prevent turning off all block options for Haplotype Association Tests and Haplotype Trend Regression instead of giving an error when trying to run the algorithm.
  • Updated Bayesian Options for the K-Fold Cross Validation tool to allow for selection of Computation Methods used for the analysis.
  • Adjusted gene collapsing for all DNA-Seq > Collapsing Methods so that non-overlapping transcripts are no longer combined into one region.

8.4.3

New Features

Bugs Fixed

  • Adjusted Data Source Library Ctrl+A / Ctrl+Shift+A behavior to modify only items in the current view.
  • Fixed Variant Classification options for Transcript Set not being recognized bug.
  • Fixed PLINK import causing project to become unresponsive on OSX bug.
  • Updated Text Export to show progress dialog.
  • Fixed issue with Convert Wizard that caused the data preview to be missing for all supported file types.
  • Fixed issue with loading feature information into the Details view when clicking on sources in GenomeBrowse.
  • Fixed color by variable issue in the Plot Viewer when variable name contains an apostrophe.
  • Fixed affection status selecting for VCF Import with more than 1000 samples.
  • Fixed error when selecting gene count output for DNA-Seq > Variant Classification tool.
  • Fixed the following issues with DNA-Seq > Annotate and Filter Variants:
    • Allow annotation sources to contain non-unicode characters
    • Allow annotation for interval sources without a Name field

Polishes

  • Updated Import Single Illumina Final Report to guess data grouping.
  • Added no-effect line to Meta Analysis Forest Plot
  • Updated tab widget for prompt dialog to dropdown instead of wide dialog for long tab list.
  • Optimized confidence interval computation for Genotype > Quality Assurance and Utilities > Fixation Index Fst to use multiple threads for computation.
  • Optimized drawing of Manhattan plots in GenomeBrowse to improve speed of drawing and redrawing at the whole genome scale.
  • Update File > Create Marker Map from Spreadsheet script to allow for the chromosome column to be of integer type.

8.4.2

New Features

Bugs Fixed

  • Import VCFs and Variant Files bug fixes:
    • Fixed import of additional spreadsheets for the Mac version of SVS.
    • Fixed issue with specifying affection status from a sample entity text file.
  • Prevent progress dialog from popping up when a Python script prompt dialog is open if a progress loop was running before the dialog was created.
  • Prevent crash when opening the Data Source Library if the source tree contains a local source that was watching a folder that was deleted while the program was closed.
  • Fixed issue merging marker maps fields with a case mismatch in the name when outputting results for Variant Classification.
  • Make sure that all executables have the correct permissions for Linux x64 and RHEL builds (aria2c, assistant, etc.)
  • In GenomeBrowse for BAM alignment plots, remember the edited value for Filter Multi-Mapped Alignments when the option is checked and unchecked.
  • Fixed Import Illumina Final Report python api to not crash when creating the spreadsheets. This will fix the crash with the add on script Illumina Text File Wrapper Script.
  • Fixed issue with Annotate and Filter Variants tool when selecting full dbNSFP tracks for version 2.9 and 3.0.

Polishes

  • Added option to output bin Betas and Standard Errors for CMC with Regression.
  • When visualizing annotation sources in GenomeBrowse set labels in the following preferred order: “Identifier” > “Ref/Alt” > “Gene Name” > “Name”
  • Allow indexing of string array fields for annotation sources. This supports querying against these fields in GenomeBrowse.
  • Added HTML format flags into annotation Source Editor so visualization of these fields can be improved though HTML formating.
  • Allow for coverage computation for VCF files in GenomeBrowse that do not contain a Genotype (GT) field but does have other sample level FORMAT files.
  • Provide option to ignore data extents warning when converting a data file to an annotation source.
  • Renamed Annotation Download Window buttons to make it clear that downloaded tracks will not be deleted through this dialog.

8.4.1

New Features

  • Added support for Gene x Environment (GxE) Interaction terms in Numeric Regression. See Numeric Regression Analysis for more information.
  • Added option to center genotypes for Bayes C/C-pi.
  • When in viewer mode have a link on the welcome screen to access example projects.
  • Added the ability to plot a 3D Scatter Plot for numeric data with an optional grouping variable. See Scatter Plot 3D for more information.
  • Now computing an overall Odds Ratio and confidence interval for Meta Analysis. Also added PP/QQ output option to the log output.
  • Updated the software updater which will now download updated files faster and allow for installation of the update at the same time or at a later time.

Bugs Fixed

  • Double-clicking on a GHP project file now launches SVS and opens the project. If you are not logged into SVS you are first asked to login.
  • Merging TSF files in the Convert Wizard now correctly merges the segment list.
  • Do not output major and minor alleles when performing Genotype Principal Component Analysis.
  • Show DNA-Seq and RNA-Seq menus on MacOS X for spreadsheets with and without genomic marker maps applied.
  • Fixed crash when outputting the clusters of runs for ROH.
  • Fixed crash when closing SVS with the Data Source Library open.
  • Fixed Y-axis zoom in GenomeBrowse when not in automatic mode.
  • Enabled access to Proxy Settings when not logged into SVS.
  • Fixed sting list fields in dbNSFP matched variant report.
  • Fixed computations of visible haplotype blocks in LD Plot window.
  • Fixed typo in the Meta-Analysis documentation for the sample-size-based approach. \Phi^{-1} was described as being “the inverse of one minus the probability density function of the normal distribution when it should be “the inverse of one minus the cumulative distribution function.
  • Fixed searching indexed fields in GenomeBrowse when annotation sources are loaded from Public Repository.

Polishes

  • Allow multiple instances of SVS and other Golden Helix products to download files from the Data Source Library simultaneously.
  • Added the following 3rd party licenses to the about dialog: asa136, newmat, and boost.
  • Converted movable toolbars to fixed “flow” toolbars for GenomeBrowse and adjusted the order of some toolbar items.
  • Added option to inactivate missing values in the Activate by Threshold column tool.
  • When applying marker maps to a spreadsheet a successful completion message will show before the final spreadsheet becomes visible to guarantee the new spreadsheet appears as the top view.
  • Sample size warnings added to Identity by Descent Estimation, GBLUP Genomic Relationship Matrix and Bayesian Genomic Prediction tools that provide a warning that the tool may not finish if there are too many samples in the dataset.
  • On Lin64, renamed libstc++.so.6 to libstdc++.so.6.rename-if-needed so that SVS will run on Ubuntu 15.04.
  • Allow offline activation of SVS license when server connection is refused by local network.
  • Joining, Merging and Appending spreadsheet tools now have the option to allow for case insensitive matching for row labels and column headers.

8.4.0

New Features

  • Golden Helix SVS now has a login and register step when first starting the software. If you do not already have a Golden Helix account, you can register and create a login by using the Register tab. Once you login you can remain logged in unless your account does not allow that option. Alternatively, you can logout at the end of each session.
  • Meta-Analysis is now available in SVS. See Meta-Analysis for more information.
  • New features for importing variants through the Import > Import VCFs and Variant Files tool, see Import VCFs and Variant Files for full details.
    • Now able to import variants from text files provided by 23andMe, see Select Files to Import for more information.
    • Added support for importing gVCF files, see gVCF Conventions for specifications on this format.
    • Added the ability to import tumor/normal pairs, see Select Relationship for more information.
    • To avoid operating system limits on the number of files that can be open simultaneously, the selected files will be batched into groups of no more than 450 files which will then be merged to complete the import.

Bugs

  • Fixed the issue with missing counts in the DNA-Seq > Collapsing Methods > Count Variants per Gene tool that occurred when there were non-overlapping transcripts with the same gene name.
  • Fixed issue with Set Genotypes to No-Call based on Additional Spreadsheets that caused the software to crash when tool is started from a spreadsheet that does not contain genotype columns.
  • Convert Wizard bugs fixed:
    • Fixed error when a CSV file was selected for converting that caused the comma delimiter to be treated as part of the field data.
  • Allow GenomeBrowse to plot data from a VCF file that contains string arrays instead of giving the error that mapFields cannot be converted from StringArray to String.
  • Fixed issued that caused Mac OSX menus and menu items for the Spreadsheet Editor to be unavailable.
  • Fixed python annotation type selection to included one, multiple, or all possible types.
  • The Edit > Recode > Recode Genotypes tool will now work correctly for mapped and unmapped genotype data in the same spreadsheet.
  • Confirm that all covariates selected for Predicting Phenotypes from Existing Results are present in the fixed effect coefficient output from K-Fold Cross Validation for Genomic Prediction

Polishes

  • Import > Import VCFs and Variant Files polishes:

    • Allow the specifying of a field in a sample information text file that contains renamed sample names. This allows the mapping of sample names to new names and fills in the Renamed Sample field in the import wizard.
    • When importing always allow the FILTER option on the Import Summary page regardless of the field merge behavior.
    • Allow for genotype import from VCF files with no explicit GT format field defined in the header of the file.
    • Make it more obvious when an invalid segment is entered into the genomic regions input box and prevent importing until a valid segment is entered or the option is cleared.
    • Report any genomic region subset specified in the Import Wizard in the node change log.
    • Now suppressing progress for applying marker maps to the imported spreadsheets.
    • Provided more informative errors when unable to sort VCF data on import.
    • Trim white space from the sample field information after splitting on the specified delimiter.
    • Allow entity information to be imported from a text file regardless of the amount of samples.
    • Added less strict matching for pedigree fields when importing family information from a text file.
  • Added additional output to GWAS results:

    • Mixed Linear Model Analysis added:

      • Actual sample size (after removing samples with missing genotypes) for each marker
      • Test (Minor) Allele and Other allele
    • Number of samples (all genotypic and numeric tests)

    • Number of cases and number of controls (all case/control genotypic and numeric tests)

    • Effect Direction, when appropriate one of these statistics is output:

      • Correlation R for most case/control tests
      • Armitage T for the Armitage and Exact Armitage tests
      • Change in the Dependent Average for the F test

      See General Statistics for more information on these statistics.

  • Convert Wizard polishes:

    • Added support for converting BED files with only chromosome, start position, and stop position.
    • Added support for vcf.gz (gzip) files.
    • Added the ability to copy source information from an existing annotation source to a new source through the Advanced Options.
  • In the Data Source Library, using the keyboard arrows to select a source will now update the information about that source in the Information Pane.

  • Clear completed downloads from the download manager after closing SVS.

  • Keep unmapped columns when subsetting visible markers from variant maps and LD plots in GenomeBrowse.

  • Added the option to select the column headers to be used in the Identifier field when exporting data to VCF files.

  • Recent projects that no longer exist are now shaded a lighter gray on the welcome screen and are italicized (except on MacOS) in the File > Recent Projects menu. The missing projects can now be cleaned up or all projects can be removed from the list with new menu options.

  • Gene and Interval annotation results now show all overlapping features in the region and allow for providing all fields from the annotation source or just the name identifier field.

  • Converted and replaced XML properties file with JSON properties file.

  • Cleaned up install to existing directory to remove deprecated library files.

8.3.4

New Features

  • Replaced current VCF importer with a new Import Wizard, Import > Import VCFs. This importer is optimized to bring in data faster as well as merge files with advanced options that were introduced in VarSeq.
  • Add allele information to the marker map after recoding genotypes to either DD/Dd/dd or numeric format.
    • If recoding based on Reference allele, adds the Alternates allele field if not present.
    • If recoding based on Major/Minor allele count, adds a Major Allele and Minor Allele field to the marker map.
  • Bayesian Genomic Prediction:
    • Added chain parameters to the options dialog including Number of Iterations, Burn-in, and Thinning.
    • Added option to set the initial value of pi.
    • Output a trace spreadsheet containing the sampled values for pi, variances, and number of markers included in the iteration.
    • Output numeric plots of the values in the trace spreadsheet.
  • K-Fold Cross Validation (K-Fold Cross Validation):
    • Recode genotypes to numeric model first to ensure consistency between folds, dropping multi-allelic markers.
    • Added observed phenotype to estimates by sample output.
    • Added fold predicted in to final results spreadsheet.
    • Now able to run K-Fold with multiple iterations, this allows for compiling iteration statistics. See Statistics for more information.
  • Added Genotype > Predict Phenotypes from Existing Results script. This script takes the output of K-Fold Cross-Validation and applies the model to a new genotype spreadsheet (with or without covariates) see Predict Phenotypes From Existing Results for more information.
  • Plot > Autocorrelation Plots now available, this can be used in conjunction with the trace output from Bayesian Genomic Prediction or on other data.

Bugs Fixed

  • Fixed numeric representation for operating systems with the region set to a location that uses different thousands or decimal separators. Now always uses ”,” as the thousands separator and ”.” as the decimal separator.
  • Fixed index assignment operator for arrays so Expression Editor can use specific entries to calculate new fields.
  • Genomic prediction methods GBLUP, Bayesian, and K-Fold bug fixes:
    • Check for covariance
    • Handle duplicate row labels by using the row label and index for matching.
  • Fixed importing of INFO files in the Import Impute2 GWAS Files scripts so that it now accepts one info file per genotype file.
  • Fixed spreadsheets Scripts menu on MacOS X so that scripts saved in this menu can be selected. Also the menu will now stay visible when no scripts are available.
  • Fixed setting image size for Save as Image from Plot Viewer (non-genomic) plots. This bug was introduced in version 8.3.2.
  • Fixed Compare Variants Across Several Spreadsheets tool so it now works correctly for spreadsheets that have no marker map.
  • Prevented crash when renaming marker mapped labels to Chromosome:Position when there is a marker-mapped dependent column.
  • On MacOS X prevented hangs after:
    • Joining and merging spreadsheets
    • Creating top-level spreadsheets
  • Fixed error with Calculate Alt Read Ratio script when additional phenotype columns are present.
  • Prevented failure to create marker map from spreadsheet when there is a “slash” in the marker map name.

Polishes

  • Added additional output to Fisher’s Exact Test for Binary Predictors including direction of effect, per marker sample totals, and odd ratio confidence intervals.
  • Updated default RefSeq gene annotation source based on GRCh37_g1k build to include Locus Reference Genome (LRG) identifiers, updated source name is RefSeq Genes 105v2, NCBI.
  • Modified warning message for Recode Genotypes to be more informative when using the Reference vs Alternate option and fewer than half the mapped markers have a genotype with a reference allele. Also, the recode dialog is now disabled while processing.

8.3.3

Bugs Fixed

  • Fixed connection to Concurrent License Server

8.3.2

New Features

  • Added K-Fold Cross Validation for genomic prediction using either GBLUP, Bayes C-Pi, or Bayes C. For more information see K-Fold Cross Validation.
  • GenomeBrowse Features:
    • Save As Image now includes the option for saving in SVG format.
    • Added support for managing Genome Assembly views in the GenomeBrowse window, genome segment order and visibility can now be changed. See Managing Genome Assembly Views for further details.
  • Added fold change output to Matched Pairs T-Test. See Matched Pairs T-Test for a full description of the additional output.
  • Import Impute2 GWAS Files importer now has the option to include INFO file data. Imported INFO data will be placed in the marker map of each created spreadsheet.
  • Autocorrelation plots are now available for numeric columns from the Plot menu. See Autocorrelation Plots for more information.
  • Added Variant Databases as a default location in the Data Source Library under Local Annotations if it exists.

Bugs Fixed

  • Fixed crash when selecting Bayesian Genomic Prediction from the Genotype menu when there wasn’t an appropriate column for gender classification of samples.

  • Fixed sorting of fields in the Data Source Library to be in numeric order for certain fields including the size of the sources.

  • Fixed crash and Python error when trying to create Python matplotlib plots such as Dendrograms and Heatmaps on Linux and RHEL platforms.

  • GenomeBrowse bugs fixed:

    • Plots now correctly maintain y-axis zoom levels when selecting Save As Image or when adjusting the x-axis position.

    • When multiple values are not present at the same location with the same feature name (such as SNPs from a Haplotype Association Test results spreadsheet) all of the results are now drawn in the numeric p-value plot.

      Note

      The y-range for existing spreadsheets and plots may need to be adjusted manually to capture all of the exiting data points that are now being drawn. This will not need to be done for plots created from new spreadsheets or new projects.

    • Fixed subset options for LD Plots. The Marker Blocks > Subset > Visible and Marker Blocks > Subset > Selected Block now return the correct markers based on the current zoom.

  • Fixed right-click Open Folder on location sub-folders in the Data Source Library.

  • Fixed crash on close of GenomeBrowse window when non-human data was being plotted and no local copy of the human reference sequence was downloaded.

  • In Haplotype Association Tests prevent 1.#INF values in the Odds Ratio Upper 95% CI column, instead set these infinite values to missing.

  • Fixed the issue with mouse clicks not registering after running Genotype Statistics by Sample on the Mac OSX.

  • Removed phantom progress bar from the Edit > Recode > Rename Alleles tool.

  • Fixed the issue with gender being inferred incorrectly when a dependent column was set for Genotype Statistics by Sample

  • Prevent INF and NAN values in fold-change columns of T-tests from Numeric > Numeric Association Tests.

Polishes

  • Renamed File > About Project to File > Project Info to provide stability in Mac OSX menus.
  • Updated DNA-Seq > Variant Binning by Frequency Source tool:
    • Now matching alleles present in variants to the Ref/Alt alleles in the selected source.
    • If there are more than three alleles present for any variant in the spreadsheet the entry from the source containing all alternate alleles will be used for binning and the minimum frequency listed will be used.
    • Now able to bin variants using frequency array fields like those found in the 1kG Phase3 - Variant Frequencies 5, GHI annotation source.
  • Add more informative error message when SVS is unable to recover an auto-saved version of the project file.
  • Adjusted layout of Add Location dialog in the Data Source Library to prevent the dialog from being too small for its contents.

8.3.1

Bugs Fixed

  • Fixed Python error in KBAC with Regression when selecting No but precompute reduced models... for the Impute Wild Type... option.
  • SKAT-O bugs fixed:
    • Analysis will now run with the “Uniform” parameter selected for Marker Weighting.
    • Fixed index error that occurred when there were regions in the data with more markers than samples available, script now correctly skips over these regions.
  • On Mac OSX, fixed the bug that prevented scripts from being launched from submenus in the project navigator or from a spreadsheet.
  • Fixed handling of missing covariate information in Bayesian Genomic Prediction when “Predict random effects for samples with missing phenotypes” is selected.
  • GenomeBrowse bugs fixed:
    • Clicking on a folder URL now launches a file explorer window at that location instead of trying to launch and failing.
    • Fixed console information for categorical array fields to have the correct size instead of a list of missing values.
    • In the Expression Editor, the “Chr” field is now correctly handled as a string and the “Start” and “Stop” fields are correctly handled as integers. This will now allow you to add expressions such as “Chromosome”, “Start”, “Stop”, “Stop - Start <= 2”, etc.

Polishes

  • File > Apply Genetic Marker Map is now case insensitive for matching marker names.
  • Updated DNA-Seq > Annotate and Filter Variants:
    • dbNSFP default options set to handle filtering of the 5 common scores.
    • Added support for new frequency data (1kG Phase3 and ExAC), so now able to filter based on unique Ref/Alt allele matching in a list of frequency values.
  • KBAC with Regression and CMC with Regression scripts now include the option to output the intermediate variables used in the calculations.
  • Updates to SKAT-O:
    • Included option for Madsen Browning weighting which assigns more weight to rare variants than common variants.
    • Allow rho of exactly one for generalized SKAT test.
  • Added option in Import > PED/TPED/BED and Export > PED/TPED/BED for PLINK import and export to keep chromosome names “as is”. This allows importing data in PLINK format from genomes that have chromosomes with non-numeric names. Made PED and TPED export also map non-autosomes to numeric chromosome identifiers if not exporting “as is”.
  • Always standardize phenotype values for Bayesian Genomic Prediction to make the method more robust.

8.3.0

New Features

Bugs Fixed

  • Fixed the variance component estimates for GBLUP and MLMM. These estimates were systematically inflated. The only results affected were the variance component estimates themselves. All other results were unaffected.
  • Import Impute2 no longer gives errors if:
    • A chromosome number is not in the name of the genotype file
    • Only dosages are imported and genotypes are not converted.
  • Saving data in the VCF format now handles real (float) valued fields correctly. Also fixed printing of missing value fields in the INFO field.
  • Exporting data in the VCF format from the data source library now:
    • Correctly places the RS ID in the identifier field instead of creating a new INFO field.
    • Removes white space from symbol names and replaces with underscores
  • Allele classification choice is now remembered for various genotype functions that have a allele frequency vs ref/alt allele choice including Genotype Statistics by Marker, Genotype Filtering by Marker, Genotype Principal Component Analysis, Genotype Association Tests.
  • Annotate and Filter Variants bugs fixed:
    • Allow filtering with variant tracks if the series name for the annotation source is empty.
    • Do not fail on variant sources that do not have a build with a valid dbNSFP source.
    • Present informative error when a reference allele field is missing in the spreadsheet’s marker map instead of presenting an uninformative python error.
  • Updated Score Compound Heterozygous Regions to handle spreadsheets that contain data from both chromosomes and scaffolds. Now, skips all segments that are not also present in the gene annotation source while correctly analyzing those that are present.
  • Bug fixed in Mendelian Error Check. If the “Report Mendelian Inconsistencies” option was selected, it resulted in the “Mendelian Errors by Marker” output being off by six marker rows. If the “Remove Mendelian Inconsistencies” option was selected, it resulted in the “Removed Mendelian Inconsistencies” output being off by six marker columns.
  • Fixed bug in Variant Classification where filtering was selected but the corresponding spreadsheet was not selected for output.
  • Prevented Fixation Index Fst from crashing when a categorical dependent column was selected with more than 20 categories.
  • Nonlinear Regression of R Squared on Distance no longer errors out when there are no missing values.
  • Fixed column move functions in the spreadsheet editor to only move the selected column and not the selected column and an adjacent column. This affects Move Column > Move Column to the Front and Move Column > Move Column to the Far Right.
  • Converting NCBI GFF files to an annotation source using Convert NCBI GFF Files to Annotation Track will use the assembly file to rename the “NC_” segment names to the chromosome names used in GenomeBrowse for plotting the annotation sources.
  • Notes and tags for Regression Viewers are now being correctly saved and reloaded on close and reopen of projects.

Polishes

  • Added option to include a note when saving an edited spreadsheet that will be included in the node change log for the spreadsheet. The node log now also indicates that the spreadsheet was created by editing a spreadsheet and contains the name of the original spreadsheet.
  • Genotype Statistics per Sample now uses the assembly file for the project species and build to identify allosomes for Gender Chromosome Statistics.
  • Added phenotype (dependent column name) to node change log for GBLUP and Mixed Linear Model Analysis
  • Added total number of runs per sample to Numeric > CNAM Output > Count Number of Segments Per Sample.
  • Removed preview boxes for the Expression Editor when adding a Filter by Sample in GenomeBrowse for Variant Maps and Heat Maps.

8.2.1

New Features

  • New GenomeBrowse features:
    • Added a “slice” function to the expression editor to strip out characters from data. This would allow values such as “[A/G]” to be treated like A/G for styling purposes.
    • Feature List now remembers the previous maximum number of features displayed for the same source. Now add additional features in chunks of up to 10,000 features.
    • Variant Maps now have a Subset Visible option on the Filter tab in the control panel to create a subset spreadsheet of the all the variants in the current zoom.
    • GenomeBrowse can now use alias names from a marker map when plotting data from spreadsheets.
  • Added nonlinear regression of LD Decay on distances for LD reports, see: Nonlinear Regression of LD R Squared on Distances for more information.
  • Updated Annotate and Filter Variants to work with dbNSFP v2.6 sources. Our curated annotation source dbNSFP NS Functional Predictions 2.6, GHI now includes VEST scores for non-commercial use only as well as all raw and rank scores. Also provided is a new light-weight source dbNSFP Predictions 2.6, GHI that only includes the prediction scores for SIFT, Polyphen2 HumVar, MutationTaster, MutationAssessor and FATHMM.
  • Created Python API to run precompute and indexing of fields for annotation sources.

Bug Fixes

  • GenomeBrowse bugs fixed:

    • Data Source Library export a VCF file in the VCF format now correctly copying format field information. Some numeric array fields were being classified as string arrays.
    • Fixed Amino Acid rendering in MT Chromosome. Now, the amino acids in MT use the MT table instead of the standard amino acid table. See: https://www.mun.ca/biology/scarr/MGA2-03-28_mtDNA_code.jpg
    • Convert Source Wizard bugs fixed:
      • Allow segments/scaffolds/chromosomes to be skipped in the segment list.
      • Automatically update documentation when converting NCBI GFF files to annotation sources instead of using only Human genome information.
      • Fixed conversion for GTF files with no exon features to an annotation source this should enable the Convert Source Wizard to work for GenCode GTF files.
      • Fixed detection of list/array fields if the first occurrence of a list is not in the first 1000 lines, also the converter now properly handles a field that contains a mixture of no values, a single value and lists of values.
    • Suppress incorrect “GRCh_38” assembly from the assembly list if present.
    • Prevent crash when sorting by a categorical array field in the Feature List.
    • Fixed Export to VCF from an annotation source, prevented a crash when the doc string is missing.
    • The shipped RefSeq Genes 105, NCBI source was indexed to allow searching on gene and transcript fields.
    • Prevented searching non-indexed annotation sources to avoid cluttering the Search and Location Bar.
    • Fixed crash when visualizing a region that included a coverage transform at the exact boundary of a chunk of data.
  • VCF import

    • Fixed handling more cases of combining features at the same location of the same type when importing multiple single sample VCF files.
    • Create a unique list of alternate alleles when merging multiple single sample VCF files.
    • Correctly classify complex variants as mixture variants when multiple alternates exist.
  • Added several checked for collinearity with covariates and permuted Y values for Mixed Model KBAC to prevent crashing and display informative error messages.

  • Correctly handle missing float values in the following functions:

    • Genomic BLUP
    • GBLUP Genomic Relationship Matrix
    • Mixed Linear Model Analysis
    • Mixed Model KBAC
    • DESeq count data
    • Dendrograms and Heatmaps input data

    If the phenotype or the covariates were single float values instead of doubles missing values were being treated as extremely large negative values instead of missing and could have resulted in incorrect results.

  • Select > Activate Rows based on Multiple Columns now captures all values where a column with hundreds of values was failing to display the last 5 values in the list.

  • Fixed Annotate and Filter Variants gathering of the node ID from the spreadsheet when using the SIFT only annotation source.

  • Updated Annotate and Filter Variants to not match alleles when the variant is an insertion for the following annotation source types:

    • dbNSFP NS Functional Predictions & dbNSFP Predictions
    • SIFT only annotation source
  • Fixed downloading issues for Mac OS X including downloading:

    • Annotation sources from the Data Source Library
    • Genomic marker maps from Golden Helix
    • GC Digest files for Wave Detection and Correction, etc.
  • Fixed how Variant Classification handles amino acids for MT variants. Now properly identifies start and stop codons as well as uses the MT specific amino acid table, see: https://www.mun.ca/biology/scarr/MGA2-03-28_mtDNA_code.jpg

Polishes

  • GenomeBrowse polishes:
    • Fixed font in data console for source information.
    • Added automatic indexing of gene and transcript fields to the NCBI GFF conversion script.
    • Updated the default recent genome assembly list to include the three most recent human genome assemblies, and the most recent mouse and rat assemblies.
  • Improved memory handling for Runs of Homozygosity and better warnings for running out of memory when performing advanced clustering methods.
  • Minimum Read Threshold function has been updated and can now take a real-valued threshold to use for the number of reads.

8.2.0

New Features

  • New method MM-KBAC adjusts for relationships between samples using a random effects matrix with KBAC (DNA-Seq > Collapsing Methods > Mixed-Model KBAC) see Mixed-Model Kernel-Based Adaptive Cluster (KBAC) Method for more information.
  • Compute a numeric relationship matrix from a pedigree. See Computing the Numerator Relationship Matrix for more information.
  • Option to predict missing phenotypes using Genomic BLUP. See Genomic Best Linear Unbiased Predictors Analysis for more information.
  • Updated Runs of Homozygosity that has updated parameters to define a run and new clustering algorithms to more accurately detect clusters of runs over all sample. See Runs of Homozygosity Analysis for more information.
  • Added Runs of Homozygosity for NGS that has an additional option to treat missing values as homozygous reference. See Runs of Homozygosity Analysis for more information.
  • New default RefSeq Gene Annotation Source based on GRCh37_g1k build curated directly from NCBI.
  • Added source convert utility to convert NCBI Gene GFF3 files to TSF format for analysis and visualization in SVS and GenomeBrowse.
  • Added 95% confidence intervals to the Fixation Index Fst output. See Algorithm Used for Confidence Intervals for more information.
  • Added the ability to select the allosome (sex chromosome) to use for gender inference in Genotype > Genotype Statistics by Sample. This will allow non-human researchers to pick the chromosome that is best used for gender inference as it may not always be ‘X’. See Genotype Statistics by Sample for more information.
  • Added an FAQ section to the SVS Manual. See Frequently Asked Questions for more information.

Bugs Fixed

  • Fixed canceling Numeric > Statistics (per Column) when not splitting the output by Chromosome.
  • Added missing Statsmodels.api datasets module required for python scripts using statsmodels, this module was missing on all operating systems except Win64 and Mac OSX.
  • Fixed how Import > Import Impute2 GWAS Files handles indels, common bases in the allele1 and allele2 string are now only stripped off if the user chooses to strip and shift the positions.
  • Updated Discretized CN Segment List to handle the additional output columns that were recently added to the CNAM Segment List spreadsheet.
  • Haplotype Trend Regression options were not properly being saved and restored, now if you change the options, or start computing HTR and cancel and change options, the new options will be used for the next HTR analysis.
  • GenomeBrowse Source Convert Wizard
    • can now handle enumerated or string lists for features.
    • can compute extents for very small float64 values (i.e. p-values) on Linux/ Mac OSX.
  • GenomeBrowse bugs fixed:
    • Save as Image will now respect the current zoom if the region was obtained by entering the location in the genomic location bar without also scrolling to zoom.
    • Deleting one or more plots by selecting them in the plot view and pressing the delete key no longer crashes the program.

Polishes

  • Changed system default genome assembly to GRCh_g1k to take advantage of the best mitochondrial reference sequence and the new RefSeq gene annotation source with updated mitochondrial gene annotations.
  • Fixed the label for base pair specification for the Numeric > Numeric Regression Analysis to be consistent with how it is used, (kb > base pairs).
  • Updated error messages for DNA-Seq > Set Genotypes to No-Call based on Additional Spreadsheets to be more informative when there are duplicate row labels, duplicate column headers, no overlapping rows, and no overlapping columns.
  • Removed warning about mixing annotation assemblies in DNA-Seq > Annotate and Filter when including sources from GRCh37, GRCh_37_hg19 and GRCh_37_g1k.
  • Added options to DNA-Seq > Variant Classification to not use predicted, incomplete, non-coding or precursor RNA transcripts for variant classification.
  • Added NCBI segment identifiers to human, bovine and latest canine genome assemblies.

8.1.5

New Features

  • Added option to use a marker map field for Activate by Gene List. The marker map can either be applied to a second spreadsheet’s rows or columns.
  • Updated Score Variants by Dominant Model to produce one score per family if pedigree information is applied to the genotype spreadsheet that is being scored. Also polished the dialog and options to correctly reflect the behavior of the feature.
  • New GenomeBrowse feature:
    • Added option for variant sources to left-align indels using a reference sequence source.

Bugs Fixed

  • GenomeBrowse specific bugs:
    • Spreadsheet-based value plots in GenomeBrowse are now reporting the chromosome numbers for features in the data console.
    • Fixed GTF source conversion to adjust CDS start/stop based on frame and allow for CDS to jump introns.
    • Fixed crash when reordering plots on MacOSX by clicking and dragging one or more plots in the plot view and clicking on the GenomeBrowse window before the plots finished drawing.
  • Fixed the Python error regarding an uninitialized variable for all collapsing methods when the genotype/variant spreadsheet had data in a chromosome that was not present in the gene annotation source.
  • Improved guarding against invalid kinship matrix selection for Mixed Linear Model Analysis and Compute Genomic BLUP (GBLUP). Now an error message will be presented instead of the feature crashing.

8.1.4

Bugs Fixed

  • DNA-Seq > Set Genotypes to No-Call based on Additional Spreadsheets incorrectly applied the marker map to the column subset spreadsheet if there were unmapped columns in the original spreadsheet. This resulted in incorrectly mapped spreadsheets from any spreadsheet created from that subset spreadsheet.
  • CMC with Hotelling T-Squared Test selecting permutation testing resulted in a Python error.

8.1.3

New Features

  • Added carrier count option to Genotype > Genotype Statistics by Marker and Genotype > Genotype Filtering by Marker. Deprecated temporary script.
  • Added Numeric > Matched Pairs T-Test. This will work on any numeric data with a sample label column (with two instances of each sample name), a grouping column with two categories, and numeric data columns. In particular, this will work for RNA-Seq data.
  • Mixed Linear Model Analysis now outputs Beta coefficients and Beta Standard Errors for all cofactors, covariates and SNPs.
  • New features related to GenomeBrowse:
    • Added the ability to save annotation sources as VCF, XLS, FASTA, and WIG files.
    • Updated and expanded the functionality of the Expression Editor for filtering and added it to the Source Convert Wizard. Now fields can be computed based on existing fields for either filtering or adding to annotation sources.

Bugs Fixed

  • GenomeBrowse specific bugs:
    • Fixed the following GTF source conversion problems:
      • Exon starts and stops were listed in strand order instead of genomic order. This only resulted in issues when performing Variant Classification using these files.
      • Accounted for out of frame transcripts to result in fewer invalid transcripts.
      • Added source field GTF file source info.
    • Fixed Convert GFF Files to Annotation Track to work on files that do not have mrna features.
    • Replaced tabix with htslib to handle symbol collision. This should fix all crashes that were occurring when trying to compute coverage and index on BAM files and indexing and compressing VCF files.
    • Fixed help links for data source library, source convert wizard and on plot help links.
    • Fixed click issues for Marker Blocks on LD plots. It should now be easier to select the correct marker and marker block.
    • GAserver has been updated to handle alias chromosome naming for remote sources.
  • Fixed PBAT bugs:
    • If there are no phenotype columns with active non-missing numeric data, present an error message instead of creating the PBAT dialog without any valid phenotype options.
    • Disable the genetic model field for binary pre-study calculations when penetrance values are input by the user.
  • Collapsing methods now properly collapse transcripts for the same gene that are interrupted by another gene. These interrupting genes were causing two results with the same gene name. Due to collapsing genes more thoroughly results can change for CMC, KBAC and Count Variants per Gene.
  • Fixed an off-by-one error that prevented the last or only variant in a chromosome from being detected as in a gene for Filter by Gene List.
  • Annotate and Filter for variant frequency sources has been fixed to prevent an error terminating the function when no matches to the frequency track exist.
  • Prevent a crash when finalizing a zero feature TSF file.

Polishes

  • Polishes related to GenomeBrowse:
    • Added the “*.fna” extension to FASTA file options for the source convert wizard.
  • Placed Spreadsheet Editor Column move actions in the Move Column submenu.
  • Hide Annotation Sizer in Save as Image for non-genomic plots.

8.1.2

New Features

  • New features related to GenomeBrowse:
    • Added support for streaming BAM files from HTTP and FTP (Active or Passive).
  • Added an API to get the SVS version number.
  • New script Calculate Carrier Counts by Variant that also computes the counts per variant call (genotype) and splits counts by group or averages the quantitative dependent variable by carrier status.

Bugs Fixed

  • GenomeBrowse specific bugs:
    • Fixed computation of BAM indexes and coverage files by upgrading to the latest htslib library.
    • Pileups now update after changing options without needing to force an update.
    • Made it possible for interval tracks with zero width features to label all non-zero width features.
    • Have assembly aliases used consistently in readers to re-map segment names.
    • Made sure that each GenomeBrowse node remembers whether or not the feature list was shown or hidden.
    • Fix regression where only one computation was run on a source such that a source requiring an index and coverage computation would not kick off the coverage after the index finished.
  • Updated SVS version in Export Variant Call File (VCF). The version should always reflect the current version of SVS now.
  • Prevent an error with Mendelian Error Check if the row labels are not unique. If they are not unique, the string “FamilyID-PatientID” is used instead. If those strings are not unique an informative error is presented.
  • Fixed a bug recently introduced that prevented VCF files with different numbers of INFO fields from being imported simultaneously.
  • Added SVS version information to the Help > About SVS dialog.
  • Prevented Variant Classification from picking a network reference sequence source by default if a local source was not detected.
  • Fixed bug that prevented the tabix TBI file from being generated for a >2GB bgzip compressed VCF file on Windows.
  • Fixed identification of upstream positions in Annotate and Filter for gene sources when filtering on Exon regions and an additional upstream/downstream distance was specified.

Polishes

  • Polishes related to GenomeBrowse:
    • Allow for strand to be set as a categorical type in annotation sources.
    • Have value plots return the value displayed in the plot as well as the genomic location in the data console.
    • Use the color field from a BED file if there is a color specified in the file when plotting.
    • Increased insertion bar visibility.
  • Added a Total column to the Find de Novo Candidate Variants report.
  • Allow for integer chromosome columns to be used for Create Marker Map from Spreadsheet.
  • Updated Annotate and Filter to work with all newer versions of dbNSFP.
  • Updated LD Reports to output the signed D statistic.

8.1.1

This was a Windows OS only release.

Bugs Fixed

  • Fixed the LevelDB dependency on the InitOnceExecuteOnce API that was causing SVS to fail to run on Windows XP (both x32 and x64).

8.1.0

Note

Although 8.1.0 is the first public release of SVS 8 (not requiring an opt-in), many of the new features were released in 8.0.0 and 8.0.1. For a summary of what’s new in SVS 8, see our What’s New page.

New Features

  • A new Annotation Convert Source Wizard enables converting many standard file formats into TSF files that can be used in GenomeBrowse visualizations as well as annotation and filtering workflows.

    • The import process allows for documentation to be written for a source (it can be edited after conversion as well).

    • GTF can be converted directly as a gene source.

    • FASTA/FA files can be used to curate new genomes (previously

      requiring hand-crafting an assembly file).

    • All computations on a source, such as computing a coverage map for visualization, is done automatically during the convert process.

    • Our annotation repository has all historically published sources still being served, but converted to TSF for convenience and for TSF specific features. Existing downloaded IDF annotations will continue to work.

    • The default shipped gene track has been updated to the latest RefSeqGenes. All annotation sources now show their version or curation date as part of their name with only the latest shown by default.

    • The Data Source Library has a new method for Exporting Data.

  • There is now the ability for Saving Plots from a GenomeBrowse Window. The current set of visible plots are rendered to a static image with options and can be saved in various image formats.

  • LD plots now have a Marker Blocks Tab, which allows you to compute marker block sets as well load and save them.

  • Haplotype Trend Regression was added as an analysis option for SNP association testing.

  • The dbNSFP track was updated and the annotation and filtering functionality now requires the latest 2.3 version.

  • The Filter by Marker Map Field add-on script is now shipped by default, it can be found in the Select menu and is called Select Variants by Filtering on a Marker Map Field.

  • Categorical columns can now visualize their unique values in a Pie Chart.

Bugs Fixed

  • The use of indexed TSF fields for searching the location bar fixes a long-standing issue where gene names were not returned in prefix match order and hence certain genes were not listed in the first 10 matching results (such as ZNF2).
  • When running Annotate and Filter, we now keep genomic (but not genotype) columns in the filtered subsets.
  • Fixed Mac OS X crashes when double right-clicking in a spreadsheet viewer. Generally fixed Mac OS X stability.
  • Fixed crash when adding BED files that specified color styling to GenomeBrowse.
  • Fixed a crash when filtering sources and scrolling from a project in the Data Source Library.
  • Half-called genotypes from VCF files are now imported as missing values.
  • InDels in start and stop codons were being classified as Synonymous in Variant Classification. They are now properly classified as length polymorphisms.
  • The VCF importer now handles files where both a INFO and FORMAT field use the same identifier.
  • KBAC and CMC with Regression now handles more elegantly collinearity and failed logistic regressions.
  • The download manager has had a number of edge cases that cause the downloads to not start fixed.
  • A crash in PBAT haplotype analysis when a default option was unchecked was fixed.
  • The binary PLINK format BED/BIM/FAM can now be imported multi-character alleles specified.

Polishes

  • The layout and interactive components of the GenomeBrowse have been rewritten to be more polished and flexible.
  • Each plot has now four potential y-axis zoom modes with a default picked appropriately for the plot type.
  • Annotate and Filter Variants has been updated to output all fields for interval tracks it is annotated against. It has had a number of other improvements done to it, such as always opening the final variant report.
  • Many annotation sources now have hyperlinks at the field level that will be clickable through the Feature Table or the Console output of clicking on a given features.
  • Styles set on tracks can be saved for annotation sources so that they are used when being added to new views.
  • The Numpy/Scipy package used by many analysis features has been updated. The Fishers Exact Test for Binary Predictors has been updated to utilize the scipy fisher_exact implementation.
  • When applying a style with many colors, you can auto-assign colors using a number of presets and options.
  • The About dialog now has a list of Credits.
  • Value plots can now have drop-lines as connector types.
  • The DNA-Seq analysis features dealing with trios have been updated to be more consistent in how they handle pedigree data and what they produce for reports.
  • Filter by Gene List is no longer case sensitive.
  • Stepwise regression is always permitted as a regression option.
  • When VCF files are read (both for import and visualization), all common bases between the Ref and Alt field are removed and the variant re-ordered at it’s new position. Alleles are also upper cased for consistency.

8.0.1

New Features

  • Added function to classify variants by inheritance pattern, DNA-Seq > Classify by Inheritance Pattern. The output reports if a variant for each trio matches a specific inheritance pattern classification.

  • New function to compute the alternate read ratio for an allelic depth (AD) spreadsheet from a VCF file (DNA-Seq > Calculate Alt Read Ratio). The expected format contains all allele depths for each sample/variant in a single cell separated by a comma.

  • New feature to create a sparse matrix for plotting segment lists in a heat map in GenomeBrowse was added (Numeric > CNAM Output Analysis > Create Sparse Segment Matrix). This tool works for both CNAM Segment Lists and ROH Runs of Homozygosity spreadsheets.

  • Mac OS X build has now been repackaged as a DMG bundle instead of a zipped file.

  • SVS application is now signed on Mac OS X. This allows it to run

    under the default security setting of OSX 10.8 and greater.

  • Window management has now been added to SVS’s project navigator. This allows for easy switching of windows, closing all plot or GenomeBrowse windows, closing all spreadsheet windows, or closing all windows of any type.

  • Added the ability to collapse the Node Change Log and User Notes pane in the SVS project navigator window. The last saved state and relative sizes of the Log and Notes pane will be remembered and restored when opening a new SVS project.

  • Added ability to open a folder and select a file from the Data Source Library and the data console in GenomeBrowse.

  • Added messages for error states such as no network connection and information regarding an empty source container in the Data Source Library for GenomeBrowse.

  • Connectors between data points for value plots in GenomeBrowse now have size controls.

  • Interval or generic plot types in GenomeBrowse now have style controls. By default the Style field will be used if available and contains data.

  • Clicking on a heat map in GenomeBrowse now returns a list of features in the overlapping range for the specific sample.

  • Added ability to show or not the reference allele coloring in variant maps for GenomeBrowse. Reference alleles are not colored by default now. This improves the visualization for variant maps with more than one sample.

  • Precompute and coverage was optimized for BAM files in GenomeBrowse. This should increase the speed of precompute and coverage as well as reduce the amount of memory required for visualizing BAM files.

Bugs Fixed

  • Fixed Mac OS X crashes when double right-clicking in a spreadsheet viewer. Generally fixed Mac OS X stability.
  • Fixed crash when adding BED files that specified color styling to GenomeBrowse.
  • Fixed a crash when filtering sources and scrolling from a project in the Data Source Library.
  • Half-called genotypes from VCF files are now imported as missing values.
  • Fixed bug with the Select > Activate by Chromosome Chromosomes currently selected list for row-mapped, sorted spreadsheets where not all rows are active. The chromosome selector now reports the correct number of active rows for each chromosome.
  • Fixed bug in Select > Activate Rows by Multiple Column Criteria that columns with six categories from showing the sixth category in the dialog. This resulted in a error dialog.
  • Fixed drawing of variant maps from row mapped spreadsheets in GenomeBrowse.
  • Fixed restoring of minimized nodes by double clicking in the project navigator.
  • Features that used pedigree information to construct a family look up table did not verify that the patient identifier was unique and the results in the case of non-unique patient identifiers and multiple families are incorrect. Canceling out of progress dialogs was also not working as intended with these features. The following features were affected and fixed:
    • Genotype > Quality Assurance > Mendelian Error Check
    • DNA-Seq > Score Compound Heterozygous Regions
    • DNA-Seq > Score Variants by Recessive Model
    • DNA-Seq > Find de Novo Candidate Variants
  • Fixed the verification of active pedigree columns for the following tools to not require specific column names as long as the spreadsheet is identified as a pedigree spreadsheet:
    • Genotype > Quality Assurance > Mendelian Error Check
    • DNA-Seq > Score Compound Heterozygous Regions
    • DNA-Seq > Score Variants by Recessive Model
    • DNA-Seq > Score Variants by Dominant Model
    • DNA-Seq > Find de Novo Candidate Variants
  • Fixed download location of genome assemblies to be the AppData User Genome Assemblies folder instead of the system Data folder.
  • Fixed hang state on Mac OS X for the following tools:
    • Genotype > PBAT Family-Based QA
    • Genotype > PBAT Genotype Analysis
    • Numeric > PBAT CNV Analysis
    • Import > Illumina > Illumina Final Report
    • Edit > Recode > Rename Marker Mapped Labels
  • Fixed error in File > Create Marker Map from Spreadsheet when the spreadsheet has a column named SNP and that column is not selected for the marker map name field.
  • Prevent LD computation from returning nan values.
  • Fixed DNA_Seq menu folder label inconsistency. This resulted in some scripts not showing up in menus on Linux.
  • Fixed project global options upgrade resulting in the Data Source Library opened from the SVS Welcome Screen with no open project showing an MD5 sum instead of the correct default genome assembly in the genome selector.
  • Fixed DNA-Seq > Annotate and Filter when annotating against an older dbNSFP annotation source.
  • Replaced undefined proportion of variance explained with a missing value indicator (‘?’) in Genotype > Mixed Linear Model Analysis.
  • Fixed data extent computation when launching GenomeBrowse from a spreadsheet with just one marker mapped row.
  • Data extents are now computed when adding additional value fields to a plot in GenomeBrowse from spreadsheets that have not had the extents computed previously.
  • Adding Item to plot was fixed to apply the values filter at the plot data level instead of the source level to allow for adding additional values to the same plot from a sources other than value sources.
  • Linkage Disequilibrium (LD) computations in GenomeBrowse were fixed to handle near-monomorphic markers correctly and to do HWE correction. We have clarified all LD computation details in the new manual section Formulas for Computing Linkage Disequilibrium (LD).
  • Disallow inclusion of numeric marker map fields in a heat map for row-mapped spreadsheets in GenomeBrowse.

Polishes

  • DNA-Seq > Annotate and Filter updates include:
    • Cancel button on individual annotation source option dialogs returns you to the annotation source selection dialog. Cancel on the annotation source dialog exits the feature completely.
    • A warning is now presented if sources from multiple builds are selected.
  • Improved memory usage for heat maps in GenomeBrowse.
  • Increased the maximum number of open files on Win64 to 2048. This should allow nearly 2000 VCF files to be imported simultaneously.
  • Renamed Genome Maps folder to Data.
  • Start and stop marker names are now added to CNAM Segment List and ROH Homozygous Runs of Length output spreadsheets.
  • Genotype > Quality Assurance > Fst By Marker output has been transposed to have one group per column. This allows for easy sorting of values to compare between groups. Also now included is a global Fst value.
  • Users are now encouraged to transpose on export of text files through File > Save As > Text or Third Party Format to optimize the export of data.
  • Removed undesired context menu items for clicking on links in web view from data console in Plot Viewer.
  • Default histogram color for Plot Viewer has now been changed to the first color in the color list instead of the legacy yellow color.
  • Plots are now sorted before adding to GenomeBrowse. They are now added in alphabetical order by source and then by field/plot name.
  • Y-axis zoom lock is now disabled when a numeric manual zoom range is entered in GenomeBrowse. This prevents most unexpected y-axis behavior.
  • Filtering of multi-mapped reads for BAM plots is now turned on by default.
  • Sources that generate errors are now allowed to be in the source list for Data Source Library. The sources are flagged and the error is displayed in the information pane.
  • Removed the following icons from the spreadsheet tool bar to allow the spreadsheet size to show by default on the Mac: Find, PBAT. Removed from all windows except the project navigator, the feedback button.

8.0.0

New Features

  • Fully integrated GenomeBrowse into Golden Helix SVS for visualization of genomic data for a streamlined and intuitive user experience. This replaces the previous genome browser in SVS. As such all genomic plots received a complete face-lift. The changes are too numerous to list here, please see GenomeBrowse: The Genomic Scale Data Visualization Tool for more information.
  • Integrated documentation of public annotation sources so the date and method of data curation is readily available.
  • If a “Stop” field is detected in a marker map, numeric data is drawn in GenomeBrowse as intervals.
  • Spreadsheet menus received an overhaul to be data type based instead of GWAS workflow based.
  • SVS manual received an overhaul to modernize the look.
  • Added a multi-source select widget api for python scripts prompting for annotation sources.
  • Replaced the download manager of annotation sources with the GenomeBrowse download manager for improved download management including pause and resume. Downloads will also now work in the background allowing you to continue with your analysis or visualization of data while downloads are in progress.
  • Added another LD stats api to output signed D statistic among other stats.
  • Added Fixation Index Fst (per Marker) computation, see Fixation Index Fst and Fixation Index Fst (by Marker).
  • Added Score Variants by Dominant Model tool to rank variants using a case/control dependent variable.
  • Added Column Sum to the column menu for numeric columns.
  • Allow for Chr#:Pos in Rename Marker Mapped Labels.
  • Added colinearity check for linear regression in Numeric Regression Analysis.
  • Repaced the 6 Filter and Annotate functions with one script DNA-Seq > Annotate and Filter Variants. This feature will take one or more data source and present the valid options for that source. Data is always annotated, filtering is performed using the indicated sources. See Annotate and Filter Variants for more information.
  • Variant Classification can now filter variants as well as annotate their classification.
  • Added DNA-Seq > Filter based on VCF Quality Metrics to filter genotypes based on additional sources using intervals.
  • Added option to copy the Chr#:Pos from a row/column if the data is mapped. This allows for easy plotting of the location in GenomeBrowse.
  • Added column of normalized absolute ASE values to Compute Genomic BLUP (GBLUP).
  • An Identifier field in a marker map can now be used to link out to external databases such as dbSNP or UCSC if it contains a RSID.
  • Added ability to use network sources for analysis and filtering, however, depending on the source and internet connection, this could be considerably slower than downloading the data and using accessing the data source locally.

Bugs Fixed

  • Fixed crash in Genotype Principal Component Analysis when the alleles were classified as Ref/Alt instead of by major and minor alleles.
  • Fixed the bug where the Find dialog was identifying the incorrect cell in a sorted spreadsheet.
  • Fixed the Import Sorted VCF Files importer to handle features that are missing sample level fields that are otherwise defined and present in other features.
  • Fixed KBAC functions to prevent most linear algebra errors due to reduced-model regression errors.
  • Fixed accounting for additional covariates in Compute Genomic BLUP (GBLUP). Previously the only fixed effects were the intercept/average instead of including the selected covariates.
  • Replaced undefined proportion of variance explained with the missing value indicator ‘?’ in Mixed Linear Models Analysis.
  • Fixed PBAT removing of families with ambiguous haplotypes.
  • Forced the project default genome to be set to GRCh_37 hg19 if an an empty default genome string is detected in a project. This caused the wrong default annotation sources to be loaded in the genome browser, now GenomeBrowse.
  • Fixed bug causing Numeric T-Tests to return integer overflow values in extreme cases.
  • Fixed the LD computation python api to prevent returning of nan values in extreme cases.
  • Fixed demo license expiration date on notification of how many days are remaining.
  • Fixed LD computation to be more robust in supporting multi-allelic genotypes.
  • Fixed the bug when importing text files with a non-standard allele delimiter.

Polishes

  • Added stop field to marker map for output from DNA-Seq collapsing methods so results can be visualized as intervals in GenomeBrowse.
  • Changed default assembly back to GRCh_37 hg19 instead of GRCh_37 g1k.
  • Now missing values can be treated as Ref_Ref in DNA-Seq > Find de Novo Candidate Variants.
  • The Node Change Log and the spreadsheet viewer in SVS was restyled to have a consistent and modern look.
  • Added a more informative error message if a user tries to import Affymetrix CEL files from an Axiom array.