4.12. VarSeq Pipeline Runner

With the addition of the “Pipeline Runner” add-on to your license, VarSeq can be run from a command shell to automate pipelines and workflows.


To add the pipeline runner to your license of VarSeq contact info@goldenhelix.com.

4.12.1. Launching VSPipeline on Windows Operating Systems

Open the VarSeq installation directory and double click on vspipeline.exe. This launches the VSPipeline command shell. You can also run it from a windows cmd.exe or cygwin shell.

4.12.2. Launching VSPipeline on Linux and RHEL Operating Systems

From a terminal, change directories to the VarSeq installation directory. Then, run ./vspipeline to launch the VSPipeline command shell. Note that it is common and supported to place the VarSeq installation directory in your $PATH so you can call vspipeline regardless of your current directory.

4.12.3. Launching VSPipeline on MacOS X

At this time, VSPipeline is not supported on MacOS X systems.

4.12.4. Downloading Annotations

Before importing projects that perform a left-align transform, you will need to have the appropriate reference sequence downloaded. You can use the bundled helper script to download these for your appropriate genome assembly build:

$ ./vspipeline -c download_annotations reference GRCh37

or for GRCh38

$ ./vspipeline -c download_annotations reference GRCh38

4.12.5. VSPipeline Command Line Arguments

A few command line arguments may be specified to modify the behavior of VSPipeline at launch time. Run vspipeline -h to display a help message describing the accepted arguments.

To execute one or more commands and exit, provide -c <command> arguments at launch time. The <command> section of the argument may contain spaces and ends when the next argument starting with - is encountered. The arguments following -c will be converted into a command to run. Only one command can be provided for each -c flag. To run multiple commands in succession, many -c arguments may be specified. Each command will be executed in the order provided. The VSPipeline program will then exit.

If a -s argument is provided, VSPipeline will not exit after command execution and will instead provide the command shell for further interactive command execution.


If any command fails during execution (even if the failure does not halt execution), vspipeline will exit with a status code of 1.

4.12.6. Accessing Help for Commands

From the command shell, help can be obtained for any of the commands by preceding the command with help. To get the list of available commands type help.

For example:

> help project_create

will provide help for the project_create command. With the -c argument, you can get help on a command directly.

For example:

$ ./vspipeline -c "help project_create"

Each help message will begin with a short usage line consisting of the command name followed by a list of its parameters. If there are no parameters for the command, none will be listed. Required parameters will be displayed first by their names, followed by optional parameters displayed by their names enclosed in square brackets ([]). If there are many optional parameters [[parameters]] will be displayed.

Each of the parameters is listed in order under the section titled ‘parameters’. They each have a name so they can either be specified in the correct order without using their names, or in any order using their names in the manner of key=value pairs.

For example:

> import sample_type=individual files=one.vcf,two.vcf


> import one.vcf,two.vcf individual

are two ways of specifying exactly the same parameters to the import command.

Each optional parameter in the list will have a documented default value. This is the value used by the command for that parameter when none is specified.

4.12.7. Running in an Automated Environment

The most straightforward way to run vspipeline in an automated environment on Mac or Linux is through the construction of a Bash script. A Bash script is a file which contains a series of commands which are executed when the script is run. Any command that can be run on the command line can be put into a Bash script. We can embed a series of vspipeline commands in a bash script using the -c argument described above. Bash scripts are convenient for automating vspipeline workflows, as they can be configured to take command line arguments which allow the user to customize pipeline inputs and outputs.

The following is an example Bash script that creates a project, imports VCF files, and exports per-sample TSV files containing annotated and filtered variants.

Example Bash script:

# Specify the vspipeline path

# Get the input and output directories from command line arguments

# Set the template and project directory
vs_template="Hereditary Gene Panel Starter Template"

# Build comma seperated list of vcf.gz files in the input directory
variant_vcfs=$(find "$input_dir" -name "*.vcf.gz" | tr '\n' ',')

# Clear the output directory
rm -rf $output_dir
mkdir $output_dir

# Run vspipeline
"$vspipeline" \
    -c get_version \
    -c project_create "$output_project_dir" "$vs_template" \
    -c import "$variant_vcfs" \
    -c download_required_sources \
    -c task_wait \
    -c foreach_sample "table_export_text VariantTable ${output_dir}/{name}-variants.tsv" \
    -c get_task_list \
    -c project_save

This script takes two command line arguments:

  1. The input directory containing the input VCF files.

  2. The output directory where vspipeline will save the outputs and project files.

The script finds all vcf.gz files in the input directory and creates a new project with these files as input. This example uses the Hereditary Gene Panel Starter Template but can be customized to use any project template.

4.12.8. Command Specification Tips

Any command parameter that contains a space must be quoted. Quotes may be double quotes (") or single quotes('). Nesting quotes, or quoting values within quotes, may be achieved by using single quotes within double quotes, or by escaping the nested quotes with backslashes (\\).

For example, when providing a non-trivial command as an argument to the batch command, the entire command argument should be quoted:

> batch "import one.vcf" "task_wait" "table_export_xlsx Table1 'variant output.xlsx'"

Backslashes (\\) may be used in file path parameter values on Windows systems, but keep in mind that whenever a backslash is followed by an escapable character, it is treated as an escape rather than a backslash. For this reason, double-backslashes or escaped backslashes (\\\\) should be preferred. Note that forward slashes (/) work in file paths on any system including Windows and may therefore be simpler to use in all cases.

Commands may be split across multiple lines in two ways.

First, if a line ends with a \ a new line will be created without executing the current line. After returning from the last line without a \, the backslashes will be striped and the lines will be combined into a single command.

Second, a command can be prefixed with a >. All additional lines will be treated as parts of a single command until the next blank line is encountered.

># all of these commands are equivalent
># single line command
>project_create test_project template="Cancer Gene Panel Starter Template"
># `>` syntax
>>project_create test_project
  template="Cancer Gene Panel Starter Template"

># backslash syntax
> project_create test_project \
  template="Cancer Gene Panel Starter Template"

4.12.9. Waiting For Task Completion

To support commands such as project_update_sources, the import and update_cnv_import are asynchronous and return control to VSPipeline as soon as the tasks of importing or updating a CNV table are started.

Before closing a project, ending a script or performing an export, you must add a task_wait commmand, which blocks until all tasks are complete and the project is in a complete state given the available inputs.

4.12.10. Deploying VSPipeline to a Production Environment

Because VSPipeline was designed to be part of a bioinformatics pipeline that may be run as one of many automated steps, it can be run from many environments when set up properly.

A standard deployment of VSPipeline may involve the following steps:

  1. Using VarSeq from any licensed workstation, create a project that will be the template for the annotations, filters and project setup you would like each batch of samples to imported into.

  2. Save the project as a template, specifying a series name and version number for tracking updates to the project template over time. (see saveAsTemplate). Grab the corresponding “.vsproject-template” file from the ProjectTemplates folder (note the Save Project as Template dialog has a convenient hyperlink to the folder).

  3. Log into the machine that will run VSPipeline as the user that will run it (using su if necessary). Run ./VarSeq from the installed directory and log in, activate and copy in the “.vsproject-template” file (again there is a hyperlink to the folder in the New Project dialog). Create a project with some test data and make sure all required files download correctly and the project runs to completion. Any private or custom annotation sources may need to be copied into the local Annotation folder.

  4. Close VarSeq and run VSPipeline in a manner consistent with how your pipeline will execute the program, for example, with a generated “batch” script or a series of -c command arguments.

The key to step three is that VSPipeline shares all the same license state, preferences and environment as VarSeq. It should be possible to use the commands login, license_activate, license_accept_eula, and download_required_sources from the context of a project that is missing local copies of sources to configure VSPipeline on a new machine without ever running VarSeq, but it will likely be more efficient and intuitive to run through the initial setup with the full context of the GUI.

Note that VSPipeline on Linux requires a machine with X11 installed, although it should run in a shell environment without a X11 display available (i.e. as most scripts run).


Like VarSeq, VSPipeline places the current users preferences, logged in state and some path configurations in a text based properties file under ~/.local/share/Golden Helix/VarSeq/User Data/vsprops.json. You can control the base path that defaults to ~/.local/share/Golden Helix/ by setting the GOLDENHELIX_USERDATA environment variable before running VSPipeline.

Additionally, the command set_data_folder_path can be used to define the specific data folder path otherwise configured by user preferences. For example, the catalogs folder could be changed as follows:

>set_data_folder_path catalogs /path/to/catalogs

The second argument can be one of: assemblies, references,catalogs, gene_preferences_file, liftover_chains, cached_files, and annotations.

The add_annotations_folder command can also add additional folders to the list of annotation folders used for algorithms.

Also see Linux Configuration in a Shared Environment for details on how to configure VarSeq to share data between multiple OS users on the same host.

4.12.11. A Note on Users and Logged In State

The behavior of VSPipeline is to execute the work flow engine that runs all the algorithms and filters on a project template when new input files are provided in the same manner that VarSeq would.

In fact, the current logged in user will be recorded in the project log, and so you may want to think about which user is logged in when running VSPipeline and potentially set up a special user for this purpose.


If you do not check the “Stay logged in” setting when logging into VarSeq, or your license is configured to require logging in on each run of VarSeq, you will similarly need to use the login command on each run of VSPipeline.

You can be explicit about which user is logged in by placing a login command at the beginning of any batch script or -c argument list to VSPipeline. For the remaining examples, we will assume a user is already logged in.


VSPipeline will look for the VS_USERNAME and VS_PASSWORD environment variables and if present execute a login as the first command.

4.12.12. Retrieving and Logging in with a Temporary License Token

A temporary 7-day license token can be retrieved and used on cloud installations. This requires your license is configured to allow “cloud installations”. Please contact support if you would like this feature added to your license. These tokens will expire after 7 days (or when your license expires, whichever is sooner). To retrieve a license token, use the following command in VSPipeline:

>get_login_token user=test@goldenhelix.com password=test

A long alphanumeric string will be returned, this is your login token.

To login with this token use the following command:

>login_token token=eyJsaWNlbnNlIjogIkcvOWNweWhJR1Zh...

(Using your full login token)

A login token can also be obtained via a curl command in a terminal to the goldenhelix server.

>curl --user test@goldenhelix.com:test

The components of the product_string can be found by using the get_version command.


Or from the about VarSeq dialog:

Location of product string in VarSeq

Location of product_string information.

The product string will always be “VarSeq” then a dash, then platform (here “Win64”), then another dash, then version number (here “2.1.0”), then another dash, then the release date (here “2018-10-30”).


VSPipeline will look for the VS_LOGIN_TOKEN environment variable and if present execute a login_token as the first command.

4.12.13. Example Workflow

Following is an example batch file that creates a project, imports some VCF files and exports per-sample Excel and CSV reports of annotated and filtered variants.

Example batch file:

### Created on 2015-07-29 for VarSeq version 1.2.0

# Create a project and open the project
project_create D:/Projects/ExampleProject2 template="Cancer Gene Panel Starter Template"

# Set the input file path
cd "D:/Cancer Tutorial Samples"

# Import two cancer samples and one control, specified in the tab
# delimited sampleInfo.txt file.
import Control.vcf,Sample1.vcf,Sample2.vcf sample_fields_file=sampleInfo.txt

# Download required sources if not found

# Wait for everything to complete before exporting

# Iterate through each sample and export an XLSX file and a TXT files
# of the VariantTable.
foreach_sample "table_export_xlsx VariantTable '{name}_filtered_variants.xlsx'"
foreach_sample "table_export_text VariantTable '{name}_filtered_variants.csv' header_groups=True"

# Now save the project
# Now close the project
### End of batch script


// and # are used to indicate comments. The comments were added in this batch file to provide information on the commands as well as make the file more readable. These comment lines are not required.

This batch file can be run as follows if the file was in the current directory.

$ vspipeline -c "batch file=iterative_sample_export_batch_file.txt"

The execution will then result in three XLSX and CSV files created with the sample name as the file name in the specified directory. The directory will look like:

Dir After Batch

The directory after running the batch file which creates the three XLSX files.

4.12.14. IDs Used in Table Exports

You may have noticed the VariantTable used in the above export commands. Because VarSeq allows for any number of tables to be created, each with its own preferences of which fields are visible and in fact which current filters are applied (some may be locked as unfiltered, others the output of the final filter results, others in-between), we have the ability to provide identifiers to tables that are then used to specify which table in the project you want to export.

In this case, we are exporting a table with the ID VariantTable with both export commands, but you may want to export different tables with each command.

You can set your own IDs while in a project (and before saving it as a template like the one used in the above project_create command) by right-clicking on any Tab and selecting to ID: [Current Name]. Clicking on this menu item allows you to edit and set your own ID that are then referable to in that current project and its descendants when saving a project template after making this change.

View and Set Identifiers

View and Set Identifiers via right-click menus for tabs