Expression Editor

The Expression Editor can be used in VarSeq to create new data columns from the existing sources, to filter data in plot views or within the Annotation Convert Source Wizard to create additional fields to be included in custom annotation sources.

To create new fields by evaluating an expression on an existing source go to Add > Computed Data... and selecting either the Variant > Per Sample > Compute Fields or the Variant > Project/Cohort > Aggregate Compute Fields algorithms. (Compute Fields Data Transformation Examples or Aggregate Compute Fields Data Transformation Example)

To filter data within plots select the plot name in the Plot Tree and then on the Filter tab of the Controls dialog (View > Dock Window > GenomeBrowse > Controls) select Insert.

To create fields in the Convert Source Wizard go to Tools > Manage Data Sources and click the Convert button in the lower left corner of the dialog. From the Desired Plot Type window click the + sign along the right side of the dialog (Select the Desired Plot Type for Delimited Text).

Expression Inputs

  • Field - The list of feature fields which are available for the current source.
  • Function - The list of functions which can be preformed on field value which include but are not limited to:
    • contains(s,b): returns true if the string s contains the string b.
    • len(s): returns the length of the string s, or the number of elements in s if it is a list.
    • split(s1,s2): returns a list created by splitting string s1 at each occurrence of string s2.
    • sum(x,y,...,z): returns the sum of all arguments.
  • Binary Operator - The list of binary operators which include but are not limited to:
    • %: mod
    • +: addition
    • -: subtraction
    • /: division
    • ^: power
    • in: check for membership in a list
    • ~: string concatenation
  • Prefix Operator - The list of operators that modify other operators which include but is not limited to:
    • !: not (inverse operation), syntax: !(boolean expression) inverses the boolean expression
    • (float): casts a value into a floating point number
    • (int): casts a value into an integer value
  • Postfix Operator - The list of operations that modify other operators which include but is not limited to:
    • !: Factorial operator, which returns the factorial of a non-negative number
    • []: Index operator for an array, which returns the element at a particular index for a list.
  • Constant - The list of constants which are available for the construction of filters.

Operations:

  • Comparison - Common logic operators used to compare fields and their values
  • Arithmetic - Common arithmetic operators which can be applied to numeric fields
  • Logic - Logic operators which can be used to combine simple expression into compound filters
  • parenthesize - Adds parenthesis around the current selection

Example Expressions

Expressions can be used to filter by numeric threshold filters, to combine fields, or create binary true/false tests on data. Below are some common workflow examples.

Simple Filter Example

A simple filter takes the form:

Chr == “X”

Where Chr represents the name of one of the fields found in the source, and “X” is the value of the field which must be placed in double quotes for type string fields; they are separated by the == comparison operator.

Convert Wizard Expression Editor

Expression editor simple filter

Postive Example - variants shown in this preview will be kept when filter expression is active.

Negative Example - variants shown in this preview will be removed when the filter expression is active.

Simple expressions can be combined by placing them in parenthesis and using the logic operators to create complex filters.

Convert Data Source Example

Creating a Variant type annotation source requires a string Ref/Alt field that is used to match reference and alternate alleles in a data source. If you are converting data from VCF files the Convert Wizard will automatically create the Ref/Alt field.

However, if your data comes from delimited text files you can create the field using the Expression Editor by concatenating separate reference and alternate allele fields with the forward slash delimiter. Data from dbNSFP is provided in this format.

Convert Wizard Expression Editor

Expression editor concatenate fields

Once the expression is entered click OK to add the field then rename then new field by highlighting the “New Field 1” entry and typing in “Ref/Alt”. Renaming the field should automatically change the Detected Plot Type to Variant.

GenomeBrowse Plot Filter Example

If you are plotting a VCF file with quality information (ex. FDP = Flow Evaluator read depth at the locus) and you would like to visualize only those variants with an FDP value greater than or equal to 200.

Expression Editor

Expression editor to add new fields

Compute Fields Data Transformation Examples

If your data table imported from a VCF file contains a strand bias (SB) list field such that each variant has four entries representing:

  1. Number of reference reads on + strand.
  2. Number of alternate reads on + strand.
  3. Number of reference reads on - strand.
  4. Number of alternate reads on - strand.

You want to calculate forward reverse read balance using the following formula:

readBalance = min(\frac{totalForwardReads}{totalReads}, \frac{totalReverseReads}{totalReads})

Expression Editor Strand Bias

Expression editor strand bias formula

If you want to exclude multi-allelic sites (more than one alternate allele) from analysis you can create a True/False field with the Expression Editor.

Expression Editor Multi-allelic sites

Expression editor multi-allelic sites

If you want to test whether the genotype for your sample(s) at each locus is Heterozygous you can use the Compute Fields algorithm.

Expression Editor Het Checking

Expression editor checking for heterozygosity

Aggregate Compute Fields Data Transformation Example

If have multiple samples in your project and you would like to compute summary information for your sample specific fields then using the Aggregate Compute Fields algorithm is the tool to use. For example if you wanted the minimum Read Depth (DP) for all samples at a particular locus.

Expression Editor Summary Sample Information

Expression editor summary sample information