VarSeq Pipeline Runner

With the addition of the “Pipeline Runner” add-on to your license, VarSeq can be run from a command shell to automate pipelines and workflows.

Note

To add the pipeline runner to your license of VarSeq contact info@goldenhelix.com.

Launching VSPipeline on Windows Operating Systems

Open the VarSeq installation directory and double click on vspipeline.exe. This launches the VSPipeline command shell. You can also run it from a windows cmd.exe or cygwin shell.

Launching VSPipeline on Linux and RHEL Operating Systems

From a terminal change directories to the VarSeq installation directory. Then run ./vspipeline to launch the VSPipeline command shell. Note it is common and supported to place the VarSeq installation directory in your $PATH so you can call vspipeline regardless of your current directory.

Launching VSPipeline on MacOS X

At this time VSPipeline is not supported on MacOS X systems.

VSPipeline Command Line Arguments

A few command line arguments may be specified to modify the behavior of VSPipeline at launch time. Run vspipeline -h to display a help message describing the accepted arguments.

To execute one or more commands and exit, provide -c <command> arguments at launch time. The <command> section of the argument may contain spaces and ends when the next argument starting with - is encountered. The arguments following -c will be converted into a command to run. Only one command can be provided for each -c flag. To run multiple commands in succession, many -c arguments may be specified. Each command will be executed in the order provided. The VSPipeline program will then exit.

If a -s argument is provided, VSPipeline will not exit after command execution and will instead provide the command shell for further interactive command execution.

Note

If any command fails during execution (even if the failure does not halt execution), vspipeline will exit with a status code of 1.

Accessing Help for Commands

From the command shell, help can be obtained for any of the commands by preceding the command with help. To get the list of available commands type help.

For example:

> help project_create

will provide help for the project_create command. With the -c argument, you can get help on a command directly. For example:

$ ./vspipeline -c "help project_create"

Each help message will begin with a short usage line consisting of the command name followed by a list of its parameters. If there are no parameters for the command, none will be listed. Required parameters will be displayed first by their names, followed by optional parameters displayed by their names enclosed in square brackets ([]). If there are many optional parameters [[parameters]] will be displayed.

Each of the parameters is listed in order under the section titled ‘parameters’. They each have a name so they can either be specified in the correct order without using their names, or in any order using their names in the manner of key=value pairs. For example:

> import sample_type=individual files=one.vcf,two.vcf

and

> import one.vcf,two.vcf individual

are two ways of specifying exactly the same parameters to the import command.

Each optional parameter in the list will have a documented default value. This is the value used by the command for that parameter when none is specified.

Command Specification Tips

Any command parameter that contains a space must be quoted. Quotes may be double quotes (") or single quotes('). Nesting quotes, or quoting values within quotes, may be achieved by using single quotes within double quotes, or by escaping the nested quotes with backslashes (\\).

For example, when providing a non-trivial command as an argument to the batch command, the entire command argument should be quoted:

> batch "import one.vcf" "table_export_xlsx Table1 'variant output.xlsx'"

Backslashes (\\) may be used in file path parameter values on Windows systems, but keep in mind that whenever a backslash is followed by an escapable character, it is treated as an escape rather than a backslash. For this reason, double-backslashes or escaped backslashes (\\\\) should be preferred. Note that forward slashes (/) work in file paths on any system including Windows and may therefore be simpler to use in all cases.

Commands may be split across multiple lines in two ways.

First, if a line ends with a \ a new line will be created without executing the current line. After returning from the last line without a \, the backslashes will be striped and the lines will be combined into a single command.

Second, a command can be prefixed with a >. All additional lines will be treated as parts of a single command until the next blank line is encountered.

># all of these commands are equivalent
># single line command
>project_create test_project template="Cancer Gene Panel Starter Template"
...
># `>` syntax
>>project_create test_project
  template="Cancer Gene Panel Starter Template"

...
># backslash syntax
> project_create test_project \
  template="Cancer Gene Panel Starter Template"
...

Deploying VSPipeline to a Production Environment

Because VSPipeline was designed to be part of a bioinformatic pipeline that may be run as one of many automated steps, it can be run from pretty much any environment when set up properly.

A standard deployment of VSPipeline may involve the following steps:

  1. Using VarSeq from any licensed workstation, create a project that will be the template for the annotations, filters and project setup you would like each batch of samples to imported into.
  2. Save the project as a template, specifying a series name and version number for tracking updates to the project template over time. (see Saving a Project as a Template). Grab the corresponding ”.vsproject-template” file from the ProjectTemplates folder (note the Save Project as Template dialog has a convenient hyperlink to the folder).
  3. Log into the machine that will run VSPipeline as the user that will run it (using su if necessary). Run ./VarSeq from the installed directory and log in, activate and copy in the ”.vsproject-template” file (again there is a hyperlink to the folder in the New Project dialog). Create a project with some test data and make sure all required files download correctly and the project runs to completion. Any private or custom annotation sources may need to be copied into the local Annotation folder.
  4. Close VarSeq and run VSPipeline in a manner consistent with how your pipeline will execute the program. For example with a generated “batch” script or a series of -c command arguments.

The key to step three is that VSPipeline shares all the same license state, preferences and environment as VarSeq. It should be possible to use the commands login, license_activate, license_accept_eula, and download_required_sources from the context of a project that is missing local copies of sources to configure VSPipeline on a new machine without ever running VarSeq, but it will likely be more efficient and intuitive to run through the initial setup with the full context of the GUI.

Note that VSPipeline on Linux requires a machine with X11 installed, although it should run in a shell environment without a X11 display available (i.e. as most scripts run).

Note

Like VarSeq, VSPipeline places the current users preferences, logged in state and some path configurations in a text based properties file under ~/.local/share/Golden Helix/VarSeq/User Data/vsprops.json. You can control the base path that defaults to ~/.local/share/Golden Helix/ by setting the GOLDENHELIX_USERDATA environment variable before running VSPipeline.

Also see Linux Configuration in a Shared Environment for details on how to configure VarSeq to share data between multiple OS users on the same host.

A Note on Users and Logged In State

The behavior of VSPipeline is to execute the work flow engine that runs all the algorithms and filters on a project template when new input files are provided in the same manner that VarSeq would.

In fact, the current logged in user will be recorded in the project log, and so you may want to think about which user is logged in when running VSPipeline and potentially set up a special user for this purpose.

Note

If you do not check the “Stay logged in” setting when logging into VarSeq, or your license is configured to require logging in on each run of VarSeq, you will similarly need to use the login command on each run of VSPipeline.

You can be explicit about which user is logged in by placing a login command at the beginning of any batch script or -c argument list to VSPipeline. For the remaining examples, we will assume a user is already logged in.

Example Workflow

Following is an example batch file that creates a project, imports some VCF files and exports per-sample Excel and CSV reports of annotated and filtered variants.

Example batch file:

### Created on 2015-07-29 for VarSeq version 1.2.0

#
# Create a project and open the project
#
project_create D:/Projects/ExampleProject2 template="Cancer Gene Panel Starter Template"

#
# Set the input file path
#
cd "D:/Cancer Tutorial Samples"

#
# Import two cancer samples and one control, specified in the tab
# delimited sampleInfo.txt file.
#
import Control.vcf,Sample1.vcf,Sample2.vcf sample_fields_file=sampleInfo.txt
#
# Download required sources if not found
#
download_required_sources

#
# Iterate through each sample and export an XLSX file and a TXT files
# of the VariantTable.
#
foreach_sample "table_export_xlsx VariantTable '{name}_filtered_variants.xlsx'"
foreach_sample "table_export_text VariantTable '{name}_filtered_variants.csv' header_groups=True"

#
# Now save the project
#
project_save
#
# Now close the project
#
project_close
### End of batch script

Note

// and # are used to indicate comments. The comments were added in this batch file to provide information on the commands as well as make the file more readable. These comment lines are not required.

This batch file can be run as follows if the file was in the current directory.

$ vspipeline -c "batch file=iterative_sample_export_batch_file.txt"

The execution will then result in three XLSX and CSV files created with the sample name as the file name in the specified directory. The directory will look like:

Dir After Batch

The directory after running the batch file which creates the three XLSX files.

IDs Used in Table Exports

You may have noticed the VariantTable used in the above export commands. Because VarSeq allows for any number of tables to be created, each with its own preferences of which fields are visible and in fact which current filters are applied (some may be locked as unfiltered, others the output of the final filter results, others in-between), we have the ability to provide identifiers to tables than are then used to specify which table in the project you want to export.

In this case, we are exporting a table with the ID VariantTable with both export commands, but you may want to export different tables with each command.

You can set your own IDs while in a project (and before saving it as a template like the one used in the above project_create command) by right-clicking on any Tab and selecting to ID: [Current Name]. Clicking on this menu item allows you to edit and set your own ID that are then referable to in that current project and its descendants when saving a project template after making this change.

View and Set Identifiers

View and Set Identifiers via right-click menus for tabs