The Import menu presents various menu items for importing different types of data:
The methods for importing data are described in the following sections.
Text files such as comma, space or tab delimited files that are saved with extensions .csv, .txt, or .dat can be imported into SVS using the Import Text File dialog. This dialog has two tabs associated with it, Input File, which specifies general dataset import options, and Advanced Options. See Text File Import Window.
Input File: After opening the Import Text window, select a file by clicking on the Browse button, which allows you to navigate your file system.
You must specify how a file is delimited in order to properly import the file. If the wrong delimiter is specified, a warning message will indicate the file may be using a different delimiter.
The dataset name may be given at this time. This name will be applied to both the dataset node as well as the spreadsheet viewer node. The spreadsheet and parent node can be renamed after import.
If the text file has a row label column, that column can be specified, or generic row labels can be created. A row label is generally a sample name or information to identify a row and not generally used for analysis. If a text file is imported with the wrong column specified for row labels this can be changed by using the Spreadsheet Editor (see Editing the Row Label Header and Row Labels) without needing to re-import the file. The default is to use the first column of data as row labels.
Advanced Options: The Advanced Options Tab allows you to specify a custom encoding list for missing data if your text file uses different characters besides an empty field, period, comma, ?, or --- (three dashes). In the custom encoding box, enter a whitespace delimited list of missing value encodings. This list will overwrite the built-in missing encoding list except for the empty field.
If genotype data exists in the text file you can specify whether or not the program should read the data as genotypic. If you un-check the “Read Genotypic Data” box, then all genotype data will be read as categorical. The allele delimiter character can be specified by choosing from the drop-down menu or by choosing “Other ->” and indicating the character in the text box to the right of the menu. If there are non-genotype fields that have an underscore in the field, these columns will be read as genotypic. These columns can be changed to categorical using the spreadsheet editor after the file is imported into the project. The default behavior is to read all fields containing an underscore as genotype data. Columns with all missing values can be encoded as Genotypic by checking “Encode columns with all missing data as genotypic.”
Header lines can be skipped by checking the “Skip” box and selecting the number of rows to skip. The default is to not skip any header lines.
The Base numeric type default is Boolean. This means that if a column of all 0’s was detected, it would be encoded as Boolean. You can also choose Integer, Single or Double precision float for the default type. There is also an option to encode real columns with single precision floats (as opposed to double precision). The values would then be stored in 4 bytes rather than 8 bytes.
Most file formats of statistical and data management programs can be read using the Import Third Party Formats dialog. To import from a file, select the file and format you wish to use by clicking on the Browse button. The files are filtered by file format at the bottom of the Select File dialog. Any of the file types listed in the drop down menu can be imported for use in Golden Helix SVS. After a file has been selected, there are options for specifying column names and the allele delimiter. See Third Party File Import Window.
Once a file has been selected, clicking Next > will lead to steps appropriate to the file type. If the file format allows for more than one worksheet, such as in Microsoft Excel ™, one sheet will have to be selected for import at a time. Other file formats or data sources might have additional steps. A row label column can be selected from the list of available columns or generic row label columns can be generated.
Golden Helix SVS can import plain text PED/MAP files, plain text TPED/TFAM files, and optimized binary PED or BED files (which should have corresponding BIM/FAM files). Because these files have full marker map information for each SNP, the marker map is also imported into Golden Helix SVS and automatically applied to the dataset.
In the Import PED/TPED/BED File window you can select the Text PED/MAP format, the Text TPED/TFAM format, or the Binary BED/FAM/BIM format. If you select the PED/MAP format, you can browse for a PED file by clicking on Browse. A corresponding MAP file will be automatically filled in, but you may choose to browse for a different MAP file. Similarly, if you select the TPED/TFAM format, you can browse for a TPED file and the corresponding TFAM file will be automatically filled in. If you select the BED/FAM/BIM format, you can browse for a BED file and the corresponding FAM/BIM files will be automatically filled in.
For all import formats, you can specify a dataset name, but you can only specify encoding options for the PED/MAP and TPED/TFAM formats. The default is to encode missing phenotypes in the Sex or Affection Status column as “-9” and missing genotypes as “0”. If your PED, TPED, and/or TFAM file has a different encoding you can specify it in the “Missing genotype” or “Missing phenotype” fields. Affection status encoding can also be specified, allowing either 1 or 0 to designate unaffected individuals in the data.
The final option allows you to specify whether you are importing human genome data or non-human genome data. If you are importing non-human genome data, the number of autosomal chromosomes in the data must be specified.
Note
The first four columns of the PED, TFAM, and FAM file formats are identifiers and encode missing values with the “0” string.
The Dataset Storage Format (DSF) is designed to allow for the sharing and collaboration of datasets between Golden Helix SVS users. The DSF format is also open to third-parties to develop the ability to create DSF files from their own products or data sources and thus more easily integrate with Golden Helix SVS.
DSF files can be imported into SVS in two ways. The first way is through the Import > Golden Helix DSF dialog which prompts the user to select a single DSF file to import. The second way is to select one or more DSF files in a file system browser and drag and drop the DSF files into an open Golden Helix SVS project. The files must be dropped in the Project Navigator Window.
Note
As of SVS version 7.6.0 multiple DSF files can be selected for import by selecting multiple files in a file system browser and dragging and dropping them in an open project.
This action still imports one file at a time, but additional files are imported immediately after the previous file has been completely imported into the project.
Legacy GHD files can be imported into Golden Helix SVS by selecting GHD files from the dialog that allows you to navigate to the location of the GHD file and import. The dataset name is retained from when the GHD file was created, but an applied marker map is not preserved.
Public data, such as HapMap data, can be downloaded and imported directly into Golden Helix SVS. The available datasets are listed in a dialog. The dataset name is retained from the selection menu, but a marker map is not applied.
Both SNP and CNV data from Affymetrix chips can be imported into an SVS project for analysis. SNP data can be imported in either the CHP or CNT formats. CNV data can be imported in the CEL, CNT and CNCHP formats.
Analysis results generated by the Affymetrix Chromosome Analysis Suite (ChAS) in the CYCHP format and CEL files generated by processing the Cytogenetics Whole-Genome Arrays can also be imported into an SVS project for analysis.
For the Affymetrix 500k, SNP 5.0, and SNP 6.0 arrays, Golden Helix SVS supports reading CEL intensity files and calculating normalized log2 ratios for copy number segmentation and association analysis. For the Affymetrix 10k, 100k, and 500k arrays, you may use the Affymetrix CNAT Batch Analysis tool to create CNT files; for the 100k, 500k, and SNP 6.0 arrays, use the Genotyping Console to create CNCHP files. These files contain normalized log2 ratios and can be imported into a dataset for analysis in SVS. See Extracting Affymetrix Copy Number Data for use in SVS for instructions on creating CNT or CNCHP files using Affymetrix tools. Affymetrix CEL, CNT, or CNCHP files can be imported directly into Golden Helix SVS versions 7.0 and higher without any additional steps.
Golden Helix SVS is able to directly import Affymetrix 10k, 100k, 500k, 5.0 and 6.0 GeneChip ® mapping array, CHP files.
For mapping arrays prior to the SNP 5.0 and SNP 6.0 arrays, you must have the corresponding library file installed for each type of mapping array you want to import. SNP 5.0 and 6.0 mapping arrays do not require library files. If you have GCOS installed, it is likely the library files are already installed in either C:/GeneChip/Library or C:/GeneChip/Affy_Data/Library (on Windows).
If you do not have GCOS installed, or need other Library files, they can be downloaded from Affymetrix through the NetAffx service. See Genetic Marker Maps and Affymetrix Library Files for importing library files through NetAffx.
Golden Helix SVS will, by default look for mapping array library files in the C:/Program Files/Golden Helix SVS/AffyLibraryFiles directory or the last directory used for Library files. There is an option for specifying the directory whenever a library file is needed for importing Affymetrix files.
The library files available from the NetAffx service are for the final versions of these mapping arrays. If you were using an experimental early access array, you will need to get the appropriate library files from Affymetrix. All that is needed for Golden Helix SVS is the CDF file for the array.
To import Affymetrix CHP files you can either select all of the files to import using Add Files from the Import CHP Files... dialog, or add an entire directory by choosing Add Directory. If you wish to remove CHP files from the list in the File box, select the files to remove and click Remove. Multiple selection is allowed by <Shift>-left-click to select a block of files or by <Ctrl>-left-click to select individual files. See Affymetrix CHP File Import Window.
Note
The “100k” and “500k” arrays, are composed of two “50k” and two “250k” chips, respectively, with their corresponding library files. These need to be imported separately, and joined together from the spreadsheet file menu. See Joining or Merging Spreadsheets for instructions on how to join the datasets from the two chips.
The other options in this import window include specifying a dataset name, changing the library path, and filtering calls based on confidence score (p-value). When importing CHP files from SNP 5.0 or 6.0 arrays, the library file location can be ignored, as it is not needed for the import process.
If you wish to use a different threshold for the confidence score, check the box and fill in the desired confidence score (a number between 0 and 1). Changing the confidence score is only valid for certain, more recent file types such as 100k or 500k CHP files. During the import process Golden Helix SVS will screen whether changing the confidence score is valid for your particular files.
Note
The Affymetrix CEL import tool reads CEL intensity files, normalizes the intensity values against the chosen or default reference samples, and imports the normalized log2 ratios into Golden Helix SVS. The methodology for calculating and normalizing log2 ratios from the CEL files is described in the Quantile Normalization of Affymetrix CEL Files section.
From the Import Affymetrix CEL file dialog (see Affymetrix CEL File Import Window), first select the CEL files you want to include in the dataset. For Mapping 500k data, you must select files from both the NSP and STY arrays for each sample. To select CEL files, click the Add Files button and use the file browser to select multiple CEL files. The CEL files you selected will appear in the CEL import dialog window. You may add all of the CEL files in a directory by using the Add Directory button. To remove CEL files from the window, select the unwanted samples and click Remove. You may continue adding CEL files by clicking the Add Files or the Add Directory buttons again. Multiple selection is allowed by <Shift>-left-click to select a block of files or by <Ctrl>-left-click to select several individual files.
In the next window, specific import options can be specified.
For the import of Mapping 500k CEL files, a matching spreadsheet containing the file names must be available in Golden Helix SVS. This spreadsheet will tell the CEL import tool how to join the NSP and STY samples together to create one sample per patient. The matching spreadsheet should have a row label column and at least two data columns. The row labels should be the sample names. The first and second columns should be the NSP and STY file names. Other columns in the dataset are optional but may contain the reference status for the sample. For Mapping 500k CEL file import, check the 500k NSP/STY Matching box and select the matching spreadsheet by clicking on Select Sheet.
The default reference set includes all samples. Another option is to select a subset from a spreadsheet containing the Reference Status for the samples. The row labels should match the sample names. For the SNP 5.0 and SNP 6.0 Array, the row labels should be the file names of the CEL files with the CEL extension removed. The reference status column should contain 0’s and 1’s where 0 denotes reference and 1 denotes non-reference status. All of the samples will be normalized against the reference samples. When a spreadsheet is selected, the 0=Ref 1=Non-Ref Column drop down box will contain the names of columns of binary data in the selected spreadsheet. Select the name of the column to be used as the reference status.
You also have the option to omit samples with the reference designation from the final output spreadsheet. To do this, check the appropriately named box. If this option is selected, reference samples will be used in normalization of data and calculation of LogR values, but will not be included in the output spreadsheet.
Another reference set option is to use HapMap precomputed populations. All 270 samples or an ethnic subset can be used.
A Marker Map needs to be selected for use in the analysis. Probes that are not contained in the marker map will not be imported. In other words, if the marker map does not contain copy number probes and the CEL files do, those probes will not be in the resulting spreadsheet. The CEL files are scanned prior to this dialog and the appropriate marker map will be detected and auto-downloaded. If the marker map has already been downloaded, navigate and select it by clicking on Select Marker Map.
The Library Path where the CDF library files are located is also automatically detected. The directory can be changed by clicking on View Library Folder. The library files should contain both SNP and CN Probes.
You may optionally select a temporary directory where intermediate DSF files will be stored. If your project is located on a shared network drive, for performance reasons you should specify a Temp Directory on a local disk.
Output options include both A and B alleles before quantile normalization, after quantile normalization and before the log ratios are computed, and the LogR ratios with samples column wise and row wise.
The name of the Dataset can be specified at this time; the default is to name the dataset “Affy CEL Dataset”.
Note
The Affymetrix CNT import tool converts multiple CNT files into one aggregate spreadsheet that contains the log2 ratio values in a format ready to be used for analysis. CNT files can be created for the Mapping 10k, 100k, and 500k arrays or for any copy number data that can be converted into a text file. See Creating CNT Files using the Affymetrix CNAT Batch Analysis Tool and Affymetrix CNT File Format for information on creating Affymetrix CNT files.
From the Import CNT Files... dialog (see CNT File Import Window), you can click Add Files to select CNT files to convert. This will open a file chooser where you can select one or more CNT files. The CNT files you selected will appear in the CNT file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CNT files from the window, select the unwanted files and click Remove. You may continue adding CNT files by clicking the Add Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be added to the same import.
You can also change the name of the dataset at this time.
Note
Row labels in the output spreadsheet will be determined by the file names, so files with the same name stored in different locations will have the same row labels.
The Affymetrix CNCHP import tool converts multiple CNCHP files into one aggregate spreadsheet containing the log2 ratio values in a format ready to be used for analysis. CNCHP files can be created for the Mapping 100k, 500k, and SNP 6.0 arrays. See Creating CNCHP Files Using Affymetrix Genotyping Console 2.0 for information on creating Affymetrix CNCHP files.
From the Import CNCHP Files... dialog (see Affymetrix CNCHP File Import Window), you can click Add Files to select CNCHP files to convert. This will open a file chooser where you can select one or more CNCHP files. The CNCHP files you selected will appear in the CNCHP file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CNCHP files from the window, select the unwanted files and click Remove. You may continue adding CNCHP files by clicking the Add Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be added to the same import.
You can also change the name of the dataset at this time.
Note
Row labels in the output spreadsheet will be determined by the file names, so files with the same name stored in different locations will have the same row labels.
The Affymetrix CYCHP import tool converts multiple CYCHP files into aggregate spreadsheets containing one or more of the possible datasets contained within CYCHP files. The possible datasets are:
From the Import CYCHP Files... dialog (see Affymetrix CYCHP File Import Window), you can click Add Files to select CYCHP files to convert. This will open a file chooser where you can select one or more CYCHP files. The CYCHP files you selected will appear in the CYCHP file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CYCHP files from the window, select the unwanted files and click Remove. You may continue adding CYCHP files by clicking the Add Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be added to the same import.
You can also change the name of the dataset at this time.
Select the output datasets to create as well as indicate whether or not CN segment covariate spreadsheets should be created. The covariates spreadsheets are defined below.
Using this option, Affymetrix DMET data is imported. Hemizygous markers are converted to homozygous markers, and tri-allelic markers are converted into two columns, each containing the major allele and one of the minor alleles. A marker map is requested after choosing the DMET file. If there are tri-allelic markers, additional markers are added to the marker map so that both columns will be mapped.
Four different Illumina file types can be imported via the Illumina submenu of the import menu. These include Illumina DSF File, Illumina Final Report From One or More Files, Matrix Text File, and iControlDB Data.
For the Illumina platform, you must use BeadStudio or GenomeStudio with the Golden Helix SVS DSF Plug-In to export the log2 ratio values from your Bead/GenomeStudio project. For instructions on how to install and use the plug-in see Exporting Data from GenomeStudio. With the DSF Plug-In, you can choose to export the entire project or specific chromosomes.
The Import Illumina DSF File dialog allows you to directly import log2 ratio data into a spreadsheet. This process will import all of the chromosomes stored in the DSF file.
This option imports multiple fields of data from Illumina Final Report text files. It can be used to just import genotype data, or multiple real- valued columns such as B-Allele Frequency, or Log R Ratios. The user can choose from all fields found in the text file and select which ones should be imported.
A separate dataset is created for each field, except for allele fields which are combined into a genotype column and GC score which is used to filter genotypes.
The option imports a user-specified file in the matrix Illumina text format that can be exported using the Final Report wizard in Illumina’s BeadStudio Software. The file can be comma or tab delimited, and GC calls can either be included or excluded.
The exported file will contain file information, a header line and then the genotype data.
During the import process, you will be prompted to choose a file to import. Once chosen you will be asked to specify whether your file is tab or comma delimited, and if the file contains GC Score data. If you choose to use a GC Score threshold, a second dialog box will appear asking you to to input your threshold value (range: 0 - 1). SNPs that have a GC Score below this threshold will be imported as missings (?_?).
The SNPs will be saved in the spreadsheet in the order they appear. You will still want to import a marker map and apply it to get a reordered spreadsheet.
This option imports iControlDB formatted data into SVS. Illumina’s iControlDB is an online database containing publicly available data, to be used as controls, for example, in an association study. The data will contain genotype and phenotype data from individuals generated from Illumina genotyping products.
The Agilent file import tool reads Agilent text files (TXT tab delimited files) that were created using the Agilent Feature Extraction software and allows you to import various fields into Golden Helix SVS for analysis. All fields are imported into SVS as they are stored in the TXT file except for the LogR field. Agilent uses a base 10 for all logarithms. To be consistent for analysis, the import process converts this field into a base 2 logarithm. Marker map information from the text files are also imported, and the resulting dataset has the marker map applied.
From the Import Agilent File dialog (see Agilent File Import Window), first select the TXT files you want to include in the dataset. To select the files, click on the Add Files button and use the file browser to choose the files for import. The files you selected will appear in the import dialog window. You may add all of the TXT files in a directory by using the Add Directory button. To remove files from the window, select the unwanted samples and click Remove. You may continue adding files by clicking the Add Files or the Add Directory buttons again. Multiple selection is allowed by <Shift>-left-click to select a block of files or by <Ctrl>-left-click to select several individual files. All TXT files must be of the same type, length and containing the same marker map information in order to be imported together. If there are files that do not match then an error message will be generated and the non-matching files will need to be removed from the list of files to import.
You can also change the name of the dataset at this time.
The NimbleGen data summary file import tool reads NimbleGen *_segMNT.txt text files (TXT tab delimited files) that were created using Roche NimbleGen Software and allows you to import various log ratio fields into Golden Helix SVS for analysis. The selected field is imported into SVS, and a marker map is created and applied to the dataset.
From the Import NimbleGen Data dialog (see NimbleGen File Import Window), first select the TXT files you want to include in the dataset. To select the files, click on the Add Files button and use the file browser to choose files for import. The files you selected will appear in the import dialog window. You may add all of the TXT files in a directory by using the Add Directory button. To remove files from the window, select the unwanted samples and click Remove. You may continue adding files by clicking on the Add Files or the Add Directory buttons again. Multiple selection is allowed by <Shift>-left-click to select a block of files or by <Ctrl>-left-click to select several individual files. All TXT files must be of the same type, length and containing the same marker map information in order to be imported together. If there are files that do not match then an error message will be generated, and the non-matching files will need to be removed from the list of files to import.
The possible log ratio fields that can be imported one or more at a time (if they are available) in all of the text files are:
If multiple log ratio data fields are selected, a single dataset will be created for field.
You can also change the name of the dataset at this time.
Golden Helix SVS supports the import of family-based data in FBAT/PBAT Pedigree format, FBAT/PBAT Phenotype format, text pedigree format, and family-based text phenotype format.
Note
Golden Helix SVS also supports joining family-based spreadsheets with other spreadsheets. This may be useful if the genetic data for your family-based study comes from sources such as the Affymetrix GeneChip ™.
The format of a text pedigree should be as follows:
The format of a text phenotype should be as follows:
In the Import FBAT Pedigree File dialog (see FBAT Pedigree Import Window), select the pedigree file (file with the .ped extension) by clicking on Browse and navigating through the file manager to the desired file. You may edit the dataset name at this time. You must also specify the Sex and Affection Status Field Encodings your file uses. Additionally, you may change the default missing encoding options by clicking on the Advanced Options tab and indicating the value that your file uses for missing phenotype and or missing genotype.
In the Import FBAT Phenotype File dialog (see FBAT Phenotype Import Window), select the phenotype file (file with the .phe extension) by clicking on Browse and navigating through the file manager to the desired file. You may edit the dataset name at this time. You may also indicate a custom encoding for missing data by clicking on the Advanced Options tab and indicating the character that your phenotype file uses for missing data.
In the Import Text Pedigree dialog (see Text Pedigree Import Window), select the text pedigree file by clicking on Browse and navigating through the file manager to the desired text file. You will need to indicate how the text file is delimited in the drop down menu. Possible options are comma, white-space, tab delimited or “Other ->”. If your text file uses a different delimiter than comma, space or tab, select other and indicated the character used in the text box to the right of the menu. You have the option of generating row labels from the Patient ID and Family ID columns or by using the first column in the text file. Additionally, you must specify how Sex and Affection Status are encoded in your file by selecting a specification for each.
On the Advanced Options tab you can indicate a custom encoding for missing data by entering in the string used for missing values in the text box. You can also change the allele delimiter if your text file uses a character different from the default underscore (_). There are several possible options and also an “Other ->” category where you can specify the character used to the right of the menu. If there are header lines in your text file you can skip them by checking in the appropriate box and indicating the number of rows in the header of the file.
In the Import Text Phenotype dialog (see Text Phenotype Import Window), select the text phenotype file by clicking on Browse and navigating through the file manager to the desired text file. You will need to indicate how the text file is delimited in the drop down menu. Possible options are comma, white-space, tab delimited or “Other ->”. If your text file uses a different delimiter than comma, space or tab, select other and indicated the character used in the text box to the right of the menu. You have the option of generating row labels from the Patient ID and Family ID columns or by using the first column in the text file. On the Advanced Options tab you can indicate a custom encoding for missing data by entering in the string used for missing values in the text box. You can also change the allele delimiter if your text file uses a character different from the default underscore (_). There are several possible options and also a “Other ->” category where you can specify the character used to the right of the menu. If there are header lines in your text file you can skip them by checking in the appropriate box and indicating the number of rows in the header of the file.
This function imports user-specified text files with extensions .txt or .csv from the HapMap project. Multiple HapMap files cannot be merged together with this script.
This function imports SNP, Insertion, Deletion, and Substitution variants from the var-[ASM-ID].tsv files provided by Complete Genomics (http://media.completegenomics.com/documents/DataFileFormats112.pdf). These files can be imported directly from the provided bzip2 sources. There is no need to decompress them before-hand. The user can choose to import one var file or several var files simultaneously. When multiple files are imported the data will be combined into a single spreadsheet. Each file is assumed to contain data for exactly one sample.
Options for the import include the base dataset name. If specified this will define the name of the dataset created by the import. If the default value of “*” is used, the dataset will take the name of the first file in the input list. The user may also specifiy whether holes in the input are filled with data indicating either homozygous reference or a missing genotype. Holes in the input occur when one of the input samples has data for a particular chromosome and position, yet another does not.
This import tool assumes records in all input files to be grouped by locus and otherwise ordered by chromosome and start position. The chromosome order is assumed to be (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, M). If the input files are found to violate these assumptions the import tool will display a warning.
This function imports 1000 Genomes .vcf file data into multiple spreadsheets. Special handling is provided for genotype data. The user can choose to import one VCF file or several VCF files simultaneously.
Options for the import include the base dataset name. If specified this will define the naming prefix for each dataset created by the import. If the default value of “*” is used, the dataset will take the name of the first file in the input list. By default, sample names found to be duplicated in multiple files are disambiguated by prepending a part of the source file name. This option can be disabled by preference. The user may also specifiy whether holes in the input are filled with data indicating either homozygous reference or a missing genotype. Holes in the input occur when one of the input samples has data for a particular chromosome and position, yet another does not. The import may be limited to a subset of the chromosomes provided in the input files by activating the “Include only Chromosome(s)” option and typing in a comma delimited list of the chromosomes for which data is to be imported. The second dialog (output selection) may also be skipped at the user’s option.
The second dialog will appear after scanning the input files for available data to be imported. Depending on the available data, fields in four categories may be presented for selection. The categories are:
Some selections allow the user to choose which output to generate. Possible output types include:
The “Reference” field available under “Fixed Field Marker Data” is selected as a marker map field by default and, for best utility in further analysis, should remain so.
The variant type information is appended to each column in the resulting spreadsheet(s). The following variant abbreviations are added to the column headers:
This import tool has been tested successfully on well formed VCF input from versions 4.0, 3.3, and 3.2 of the 1000genomes.org spec (http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40).
This option imports the mldose, mlgeno and mlinfo files output from MACH. The mlinfo file has to be imported but the mldose and mlgeno files are optional.
Note
If your MACH output data was generated using the HapMap website it will need to be converted from UTF-16 encoding to UTF-8 encoding for this script to work. There are free utilities online for this purpose. One such website is http://www.fileformat.info/convert/text/utf2utf.htm.
This script imports a user-specified text file (either CSV or TXT) generated from Parallele.
This function imports the Taburized quantification data from pipeline.goldenhelix.com
The file should be in a gene (or isoform) as rows and samples as columns format. The first four (optionally five) columns should contain the gene (or isoform) name, chromsome, start, stop and optionally transcripts.