This documentation will cover the steps required to get up and running with Sentieon to process your FASTQs into VCFs and BAMs. The main steps are ensuring you have a compatible operating system, downloading Sentieon's tools and Golden Helix's associated starter scripts, creating a license server, and building a starter pipeline.
Sentieon is designed to run on Linux and other POSIX-compatible platforms. Sentieon supports several Linux distributions, Apple OSX, and WSL2 for Windows users (NOTE: Sentieon no longer supports Cygwin for Windows users).
The following Linux distributions are recommended for Linux and WSL2: - RedHat/CentOS 6.5 - Debian 7.7 - OpenSUSE-13.2 - Ubuntu-14.04
For Apple OSX, OSX 10.9 (Mavericks) or higher is recommended.
Sentieon requires at least 16GB of memory, but 64GB of memory is recommended, especially for whole exome or whole genome sequencing. There are, however, options for throttling Sentieon's memory usage. With respect to CPU cores, Sentieon's algorithms are highly parallelized and they have demonstrated near-linear runtime improvement with increased availability of computational threads.
Installing WSL2 (Windows Only)
If you are a Windows user and do not yet have Windows Subsystems for Linux 2 installed on your machine, please follow the instructions in the Windows Installation tab.
You will need to update the system proxy settings to allow for access to the internet. If you are behind a proxy you will need to set the HTTP_PROXY and HTTPs_PROXY variables.
The template for setting these variables looks like this:
This might look like the following:
export HTTP_PROXY="http://user:email@example.com:3128" export HTTPS_PROXY="http://user:firstname.lastname@example.org:3128"
These lines should be added to the "~/.profile" file or, if that doesn't exist, they should be added to the "~/.bashrc" file. After the lines have been added, source the files by running the following command to update these variables in your working environment:
or if you edited ~/.bashrc:
This will allow the utilities and scripts that follow to access the internet.
You need to have git installed to download the scripts. It can be installed with the following commands:
sudo yum install git -y
sudo apt-get install git -y
The scripts are stored in git repository. To download the scripts use git to clone the repository.
git clone https://goldenhelix.kilnhg.com/Code/Public/Secondary/Secondary-Analysis.git
This will create a directory called Secondary-Analysis. In this directory there will be a collection of scripts to get you started calling variants with Sentieon.
Once the script have been downloaded you can always check for updates by running update script:
This will update all of the script to the latest versions.
The next step in the setup is to download the Sentieon software, as well as the reference sequence that is used to align and call the variants.
To download the Sentieon tools execute the following command:
This will download the latest version of the sentieon tools. In the furture if you want to update to the latest verion you can rerun the script and the latest version will override the onse saved in tools/sentieon.
The next step is to download the reference tracks for variant calling. This can take quite a bit of time as the reference sequence and it's accompanying files are over 8 gb. After it completes the reference will have been downloaded to the resources folder, and the Sentieon and VarSeq software will have been placed in the tools folder. To download the references (GRCh37 and hg38) you can run the following commands:
The reference that is downloaded is the 1000 Genomes grch37 Decoy if the grch37 script is selected or the Broad hg 38 reference files. Other references can be used as long as they have been indexed for use with BWA. Please contact Golden Helix support if you would like help setting up a different reference.
When the downloads have finished, proceed to Licensing to generate a request for a license.