Python Application Programming Interface (API)

GHI Module for the Python API

This module contains functions that do not require an object to act on.

Project Level Operations

ghi.closeProject()

Closes the current project without saving the state of the project.

See also

exit

Examples

>>> ghi.closeProject()
>>> 
ghi.enableNewViewers(enable)

Enables or disables the displaying of new GUI viewers.

Showing viewers while running a script can be inconvenient at times. This command will either suppress or allow the display of GUI viewers while executing scripts. Viewers can be turned on or off at any time while running a script, and this command only affects scripts that are run from the Scripts menu of a viewer.

Parameters:

enable : integer (one of below)

Examples

>>> ghi.enableNewViewers(ghi.const.AllGuiElements)
>>> 
ghi.exit()

Exit and close the program. You may be prompted to save the open project.

See also

closeProject

Examples

>>> ghi.exit()
ghi.getCurrentObject()

Returns the currently selected Project Navigator Window Node.

Returns:navNode : NavNode Object

Examples

>>> ss = ghi.getCurrentObject()
>>> ss
spreadsheet(
 Name=Affy 6.0 HapMap 270 Genotypes - Column Subset
 Node ID=65
 Rows=270
 Columns=1000
 Data Set Name=Affy 6.0 HapMap 270 Genotypes
)
>>> 
ghi.getObject(id)

Returns an object for the Project Navigator Window Node with the requested ID.

Parameters:id : integer
Returns:navNode: NavNode :

See also

getCurrentObject

Examples

>>> ss = ghi.getObject(65)
>>> ss
spreadsheet(
 Name=Affy 6.0 HapMap 270 Genotypes - Column Subset
 Node ID=65
 Rows=270
 Columns=1000
 Data Set Name=Affy 6.0 HapMap 270 Genotypes
)
>>> 
ghi.getProperty(property)

Returns a file system paths specific to the program or project

Parameters:

property : integer (one of below)

Returns:

absolute_path : string

Paths specific to the property requested or an empty string if a project path is requested and a project is not open

Examples

>>> ghi.getProperty(ghi.const.ProjectPath)
u'C:/Projects/Discovery'
>>> 
ghi.chooseFile(filter, dialogTitle, mode)

Displays a dialog window for browsing and selecting a file or multiple files.

Parameters:

filter : string

File extension of the form '*.txt' to show all TXT files

dialogTitle : string

Title of dialog window

mode : integer (one of below)

Specifies type of file chooser

Returns:

filepaths : list of one or more string elements containing a full file path

Examples

>>> myFilePath = ghi.chooseFile('*.txt', 'Choose a file please...', 
                                ghi.const.ChooseSingleFile)
>>> myFilePath
[u'D:/Example Data/file1.txt']
>>> myMultFiles = ghi.chooseFile('*.txt', 'Choose several files...', 
                                 ghi.const.ChooseMultipleFiles)
>>> myMultFiles
[u'D:\Example Data\file1.txt',u'D:\Example Data\file2.txt']
>>> saveAsPath = ghi.chooseFile('*.txt','File name to save as...', 
                                ghi.const.ChooseSaveAs)
>>> saveAsPath
[u'D:/Example Data/file3.txt']
>>> 
ghi.chooseDirectory(caption, startDir)

Presents a dialog allowing directory selection.

Parameters:

caption : string

Selection dialog caption

startDir : string, optional

Directory in which the selection dialog will be opened to initially

Returns:

directory path : string

Examples

>>> myDirPath = ghi.chooseDirectory('Select directory to save the file', 
                                    'C:/Projects/')
>>> 
>>> myDirPath
u'C:/Projects/Discovery'
>>> 
ghi.chooseMarkerMap()

Displays the marker map selector window

Returns:markermap : string

Examples

>>> ghi.chooseMarkerMap()
>>> u'Affy SNP5 Marker Map - na31 2010_08_30.dsm'
>>>
ghi.tmpFileName(extension)

If a project is open, creates a unique temporary file name with the given extension in the project temp folder. If a project is not open, create a unique temporary file name with the given extension in the system temp folder.

Parameters:

extension : string

Temporary file is created with the specified extension

Returns:

tempFilePath : string

File path to the temporary file

ghi.saveProject()

Saves the current project.

See also

saveProjectAs

Examples

>>> ghi.saveProject()
>>> 
ghi.saveProjectAs(projectCopyPath, projectCopyName, switchToCopy, preservePassword)

Saves a copy of the current project.

Parameters:

projectCopyPath : string

Path and directory where the copy of the project is to be saved.

projectCopyName : string

Name of the project copy.

switchToCopy : bool

Open the copied project on completion

preservePassword : bool, optional, default: True

If a password is set for the current project, preserve it for the copy

Returns:

success : bool

Whether or not a copy of the project has been created successfully

Examples

>>> ghi.saveProjectAs('C:/Projects/', 'DiscoveryCopy', False)
>>> True
ghi.projectMaxNodeID()

Returns an integer value that is greater than the maximum ID of the current project’s nodes.

Examples

>>> ghi.projectMaxNodeID()
10
>>> 
ghi.message(msg, details)

Displays the input message string in the GUI in a message box.

This command will take the text parameter and display it in an expandable detailed text box until the OK button is clicked.

Parameters:

msg : string

String to display in a message dialog.

details : string, optional

Extra information to display if the Details box is selected.

Examples

>>> ghi.message('Hello World!','This dialog is working correctly.')
>>> 
ghi.newProject(projectName, projectLocation)

Creates a new project.

Parameters:

projectName : string

Name of the new project

projectLocation : string

An existing folder in the file system where the new project will be created

Returns:

success : bool

Whether or not the new project was successfully created

Examples

>>> ghi.newProject('Discovery','c:/Projects')
>>> True
>>>
ghi.projectIsOpen()

Command detects if a project is currently open.

Returns:

projectOpen : bool

Whether or not a project is currently open

Examples

>>> ghi.openProject("C:/Projects/Discovery/Discovery.ghp")
>>>
>>> ghi.projectIsOpen()
True
>>> ghi.closeProject()
>>> ghi.projectIsOpen()
False
>>>
ghi.openProject(location)

Opens an existing project at the specified path and file name location.

Parameters:

location : string

Project name and full file path

Examples

>>> ghi.openProject('c:/Projects/Discovery')
>>> 
>>> 
ghi.openUrl(url)

Opens the specified URL in the default web browser.

Parameters:url : string

Examples

>>> ghi.openUrl("http://data.goldenhelix.com/das")
>>>
ghi.openFolder(path)

Opens the specified path in the file explorer of the current platform.

Parameters:path : string

Examples

>>> ghi.openPath("C:/data")
>>> 

Import Operations

ghi.convertAffyAnnotations(fileNames)

Converts Affymetrix Annotation files from the text file format available from direct download from the Affymetrix website to a DSM marker map file. This is useful for converting older annotation versions that are not available from the Golden Helix Affymetrix Annotation Download utility in SVS.

Parameters:

fileNames : list of strings

List of file paths for the CSV files to convert

Returns:

success : bool

Examples

>>> ghi.convertAffyAnnotations([
        'd:/ConvertAffy/GenomeWideSNP_6.na31.annot.csv',
        'd:/ConvertAffy/GenomeWideSNP_6.cn.na31.annot.csv'])
>>> True
>>> 
ghi.convertMarkerMap(fileName, markerMapName, snpIdCol, chromosomeCol, positionCol, **kws)

Converts a marker map stored in a text file or MAP file to the DSM format.

Parameters:

fileName : string

File and path of the marker map to convert

markerMapName : string

Name for the marker map and DSM file

snpIdCol : integer

Column number containing marker name information

chromosomeCol : integer

Column number containing chromosome information

positionCol : integer

Column number containing position information

delimiter : string (one of below), optional

Specifies text file delimiter
  • ‘,’: comma-delimited (default)
  • ‘ ‘: space-delimited
  • ‘ ‘: tab-delimited
  • any other one character delimiter string

missingEncoding : string, optional

Specifies the missing phenotype information
  • ‘?’: question mark (default)
  • any other one character delimiter string

skipNumLines : integer, default: 1, optional

Indicates number of header rows

stopReadSentinel : string, optional

Stop reading the input file when this string is encountered as the first field in the file.

optColumns : dictionary, optional

Specifies additional marker map columns
  • Ex: optColumns = {‘cytoband’:colNumber, ‘RS ID’:colNumber}
Returns:

success : bool

Examples

>>> ghi.convertMarkerMap('c:/Temp/AffySNP6_v31MM.csv',
                         'AffySNP6_v31',1,2,3, delimiter = ',',
                         missingEncoding = '?',
                         optColumns = {'Cytoband':5,'Reference':6})
>>> True
>>> 
ghi.importBED(dataSetName, bedName, famName, bimName)

Imports a BED file with associated FAM and BIM files.

Parameters:

dataSetName : string

Name to use as the dataset in the project

bedName : string

Path and file name for the BED file

famName : string

Path and file name for the associated FAM file

bimName : string

Path and file name for the associated BIM file

Returns:

spreadsheet : Spreadsheet

Examples

>>> mySS = ghi.importBED('My BED Dataset', '/data/myBEDfile.bed', 
                         '/data/myFAMfile.fam', '/data/myBIMfile.bim')
>>> 
>>> mySS
spreadsheet(
 Name=My BED Dataset - Sheet 1
 Node ID=19
 Rows=193
 Columns=500574
 Data Set Name=My BED Dataset
)
>>> 
ghi.importCEL(celFiles, **kws)

Imports Affymetrix CEL file data and normalizes the copy number intensity values.

Parameters:

celFiles : list of strings

List of full string paths for CEL file names

referenceMode : integer (one of below)

Specify mode for building or using references

referenceSheetId : integer

Node id of spreadsheet containing reference status, requires referenceMode = ghi.const.ReferenceSubset and referenceStatusColumn

referenceStatusColumn : integer

Column containing the reference status in the spreadsheet specified by referenceSheetId, requires referenceMode = ghi.const.ReferenceSubset and referenceSheetId

referencePopulation : string

Name of reference population to auto-download precomputed reference vectors, for supported platforms.

  • ‘All’: All populations CEU, CHB+JPT and YRI
  • ‘CEU’: HapMap Ceph population
  • ‘CHB+JPT’: HapMap Chinese and Japanese populations
  • ‘YRI’: HapMap Yourban population

markerMapName : string, optional

Name of the marker map in the marker maps folder. If not specified the map will be auto detected.

cdfFileName : string, optional

Name of the Affymetrix Library File in the AffyLibraryFiles folder. If not specified the CDF file will be auto-selected.

tempDir : string, optional

Specification of a temporary directory, this is recommended if the CEL files are stored on a network drive.

mappingSheetId : integer, required for 500k NSP/STY CEL files

Node ID of the spreadsheet containing NSP/STY mapping information

dropReferences : bool, default: False

Whether or not to keep reference samples in the output spreadsheet

importABs : bool, default: False

Whether or not to import raw A/B intensities

importNormalizedABs : bool, default: False

Whether or not to import normalized A/B intensities

importUnmergedLogRs : bool, default: False

Whether or not to import unmerged LogR ratios

importLogRsColwise : bool, default: False

Whether or not to import LogR ratios columnwise

importLogRsRowwise : bool, default: True

Whether or not to import LogR ratios rowwise

datasetName : string, default: ‘Affy CEL Dataset’

Allows specification of the new dataset base name

Returns:

spreadsheets : Spreadsheets

Imports all selected spreadsheets into the open project.

Examples

>>> files = ghi.chooseFile('*.CEL', 'Choose CEL Files to Import', 
                           ghi.const.ChooseMultipleFiles)
>>> files
[u'N:\Affymetrix\SNP6SampleData\hapmap\Caucasians\NA06985_GW6_C.CEL', 
 u'N:\Affymetrix\SNP6SampleData\hapmap\Caucasians\NA06991_GW6_C.CEL', 
 u'N:\Affymetrix\SNP6SampleData\hapmap\Caucasians\NA06993_GW6_C.CEL', 
 u'N:\Affymetrix\SNP6SampleData\hapmap\Caucasians\NA06994_GW6_C.CEL', 
 u'N:\Affymetrix\SNP6SampleData\hapmap\Caucasians\NA07000_GW6_C.CEL', 
 u'N:\Affymetrix\SNP6SampleData\hapmap\Caucasians\NA07019_GW6_C.CEL', 
 u'N:\Affymetrix\SNP6SampleData\hapmap\Caucasians\NA07022_GW6_C.CEL']
>>> celImport = ghi.importCEL(files, 
              referenceMode = ghi.const.ReferencePopulation,
              referencePopulation='CEU', 
              markerMapName='Affy SNP6 Marker Map - na30 2010_08_27.dsm',
              importNormalizedABs = True, importLogRsColwise = True, 
              importLogRsRowwise = True, 
              datasetName = 'Seven CEU HapMap Samples')
>>> 
>>> celImport
[spreadsheet(
 Name=Seven CEU HapMap Samples - Quantile Normalized - SNP - Sheet 1
 Node ID=94
 Rows=929967
 Columns=14
 Data Set Name=Seven CEU HapMap Samples - Quantile Normalized - SNP
 ), spreadsheet(
 Name=Seven CEU HapMap Samples - Quantile Normalized - CN - Sheet 1
 Node ID=97
 Rows=945806
 Columns=7
 Data Set Name=Seven CEU HapMap Samples - Quantile Normalized - CN
 ), spreadsheet(
 Name=Seven CEU HapMap Samples LogR - Samples Columnwise - Sheet 1
 Node ID=100
 Rows=1875773
 Columns=7
 Data Set Name=Seven CEU HapMap Samples LogR - Samples Columnwise
 ), spreadsheet(
 Name=Seven CEU HapMap Samples LogR - Samples Rowwise - Sheet 1
 Node ID=103
 Rows=7
 Columns=1875773
 Data Set Name=Seven CEU HapMap Samples LogR - Samples Rowwise
)]
>>> 
ghi.importCHP(chpFiles, dataSetName, libraryPath, confidenceScore)

Imports data from CHP files into a dataset.

Parameters:

chpFiles : list of strings

List of full string paths for CHP file names

dataSetName : string

Name for created dataset

libraryPath : string, optional

Path to Affymetrix library files directory if different from the default path

confidenceScore : float, optional

Specifies a value to use as a confidence score upper limit threshold

Returns:

spreadsheet : Spreadsheet

Examples

>>> ss = ghi.importCHP(['C:\HapMap_CEU\NSP\CEU_NA06985_NSP.CHP',
                        'C:\HapMap_CEU\NSP\CEU_NA06991_NSP.CHP',
                        'C:\HapMap_CEU\NSP\CEU_NA06993_NSP.CHP',
                        'C:\HapMap_CEU\NSP\CEU_NA06994_NSP.CHP'],
                       'Four HapMap Samples',confidenceScore = 0.5)
>>> 
>>> ss
spreadsheet(
 Name=Four HapMap Samples Dataset - Sheet 1
 Node ID=10
 Rows=4
 Columns=262264
 Data Set Name=Four HapMap Samples Dataset
) 
>>> 
ghi.importCNCHP(cnchpFiles, dataSetName)

Imports data from CNCHP files into a dataset.

Parameters:

cnchpFiles : list of strings

List of full string paths for CNCHP file names

dataSetName : string

Name for created dataset

Returns:

spreadsheet : Spreadsheet

Examples

>>> ss = ghi.importCNCHP(['C:\HapMap_CEU\NA06985_GW6_C.CN5.cnchp',
                          'C:\HapMap_CEU\NA06991_GW6_C.CN5.cnchp',
                          'C:\HapMap_CEU\NA06993_GW6_C.CN5.cnchp'],
                         'Three HapMap Samples')
>>>
>>> ss
spreadsheet(
 Name=Three HapMap Samples Dataset - Sheet 1
 Node ID=13
 Rows=3
 Columns=1821903
 Data Set Name=Three HapMap Samples Dataset
)
>>> 
ghi.importCNT(cntFiles, dataSetName)

Imports data from CNT files into a dataset.

Parameters:

cntFiles : list of strings

List of full string paths for CNT file names

dataSetName : string

Name for created dataset

Returns:

spreadsheet : Spreadsheet

Examples

>>> ss = ghi.importCNT(['/data/sample1.cnt','/data/sample2.cnt',
                        '/data/sample3.cnt'],'My CNT Dataset')
>>> mySS
spreadsheet(
 Name=My CNT Dataset - Sheet 1
 Node ID=32
 Rows=3
 Columns=2000
 Data Set Name=My CNT Dataset
)
>>> 
ghi.importDSF(fileName)

Imports a Golden Helix DSF file.

Parameters:

fileName : string

Path to the DSF file

Returns:

spreadsheet : Spreadsheet

Examples

>>> mySS = ghi.importDSF('/data/myDSFfile.dsf')
>>> 
>>> mySS
spreadsheet(
 Name=myDSFfile_helixtree_read Pedigree - Sheet 1
 Node ID=23
 Rows=193
 Columns=500574
 Data Set Name=myDSFfile_helixtree_read Pedigree
)
>>> 
ghi.importData(fileName, columnHeaderMode, alleleDelimiter, rowLabelColumn, workSheet)

Imports data from third-party files.

Parameters:

fileName : string

Path and file name for third-party file

columnHeaderMode : integer (one of below), optional

Specifies method for column headers

alleleDelimiter : string, default: ‘_’, optional

Specifies genotypic allele delimiter

rowLabelColumn : integer, > 0, optional

Specifies method for row labels. If parameter is not specified then generic labels are created

workSheet : integer, default: 1, optional

Specifies worksheet to import if applicable

Returns:

spreadsheet : Spreadsheet

See also

Third Party File

Examples

>>> ss = ghi.importData('/data/myFile.xls',
                        columnHeaderMode = ghi.const.HeaderAutoDetect,
                        alleleDelimiter = '|', rowLabelColumn = 1, 
                        worksheet = 1)
>>> 
ghi.importFbatPedigree(fileName, dataSetName, missingPhenotype, missingGenotype)

Imports a FBAT Pedigree (PED) file.

Parameters:

fileName : string

Path and file name of PED file for import

dataSetName : string

Name for the dataset in the project

missingPhenotype : string, default: ‘?’

Specifies the missing value encoding

missingGenotype : string, default: ‘?’

Specifies the missing allele encoding

sexEncoding : string, default: ‘012’, optional

Specifies how the Sex field is encoded
  • ‘?01’: Missing = ?, Male = 0, Female = 1
  • ‘012’: Missing = 0, Male = 1, Female = 2

affStatusEncoding : string, default: ‘012’, optional

Specifies how the Affection Status field is encoded, if a binary field
  • ‘?01’: Missing = ?, Control = 0, Case = 1, or import as numeric field
  • ‘012’: Missing = 0, Control = 1, Case = 2
Returns:

spreadsheet : Spreadsheet

Spreadsheet where the first six columns contain pedigree information.

Examples

>>> mySS = ghi.importFbatPedigree('/data/myFBatPedigreeFile.ped', 
               'My Pedigree Dataset', missingPhenotype = '?', 
               missingGenotype = '?', sexEncoding = '?01',
               affStatusEncoding = '012')
>>> 
>>> mySS
spreadsheet(
 Name=My Pedigree Dataset Pedigree Dataset - Sheet 1
 Node ID=46
 Rows=3000
 Columns=56
 Data Set Name=My Pedigree Dataset Pedigree Dataset
)
>>> 
ghi.importFbatPhenotype(fileName, dataSetName, missingEncoding)

Imports a FBAT Phenotype (PHE) file.

Parameters:

fileName : string

Path and file name of PHE file for import

dataSetName : string

Name for the dataset in the project

missingEncoding : string, default: ‘?’, optional

Specifies the missing phenotype information

Returns:

spreadsheet : Spreadsheet

Examples

>>> mySS = ghi.importFbatPhenotype('/data/myFBatPhenotypeFile.phe', 
                                   'My Phenotype Dataset', '?')
>>> 
>>> mySS
spreadsheet(
 Name=My Phenotype Dataset Phenotype Dataset - Sheet 1
 Node ID=49
 Rows=1000
 Columns=22
 Data Set Name=My Phenotype Dataset Phenotype Dataset
)
>>> 
ghi.importGHD(fileName)

Imports a Golden Helix Legacy GHD file.

Parameters:

fileName : string

Path and file name of GHD file for import

Returns:

spreadsheet : Spreadsheet

Examples

>>> mySS = ghi.importGHD('/data/myGHDfile.ghd')
>>> 
>>> mySS
spreadsheet(
 Name=myGHDfile - Sheet 1
 Node ID=26
 Rows=270
 Columns=10005
 Data Set Name=myGHDfile
)
>>> 
ghi.importMapAsSpreadsheet(dsmFile)

Imports a genetic marker map DSM file into the project as a spreadsheet object.

Parameters:

dsmFile : string

Path and file name of DSM file for import

Returns:

spreadsheet : Spreadsheet

Examples

>>> mySS = ghi.importMapAsSpreadsheet(
              '/Golden Helix SVS/MarkerMaps/myMarkerMap.dsm')
>>> 
>>> mySS
spreadsheet(
 Name=myMarkerMap (na30) Marker Map - Sheet 1
 Node ID=43
 Rows=499264
 Columns=11
 Data Set Name=myMarkerMap (na30) Marker Map
)
>>> 
ghi.importPED(dataSetName, pedName, mapName, missingPhenotype, missingGenotype)

Imports PED files along with the corresponding MAP file.

Parameters:

dataSetName : string

Name for the dataset in the project

pedName : string

Path and file name of PED file for import

mapName : string

Path and file name of MAP file for import

missingPhenotype : integer, default: -9

Specifies the missing value encoding

missingGenotype : string, default: ‘0’

Specifies the missing allele encoding

Returns:

spreadsheet : Spreadsheet

Spreadsheet where the first six columns contain pedigree information.

Examples

>>> mySS = ghi.importPED('PED Dataset','d:/Example Data/myPEDfile.ped', 
                         'd:/Example Data/myMAPfile.map', 
                         missingPhenotype = -9, missingGenotype = '?')
>>> 
>>> mySS
spreadsheet(
 Name=My PED Dataset - Sheet 1
 Node ID=4
 Rows=100
 Columns=206
 Data Set Name=My PED Dataset
)
>>> 
ghi.importTPED(dataSetName, tpedName, tfamName, missingPhenotype, missingGenotype)

Imports TPED files along with the corresponding TFAM file.

Parameters:

dataSetName : string

Name for the dataset in the project

tpedName : string

Path and file name of TPED file for import

tfamName : string

Path and file name of TFAM file for import

missingPhenotype : integer, default: -9

Specifies the missing value encoding

missingGenotype : string, default: ‘0’

Specifies the missing allele encoding

Returns:

spreadsheet : Spreadsheet

Examples

>>> mySS = ghi.importTPED('My TPED Dataset', '/data/myTPEDfile.tped', 
                          '/data/myTFAMfile.tfam', missingPhenotype= -9, 
                          missingGenotype = '?')
>>> 
>>> mySS
spreadsheet(
 Name=My TPED Dataset - Sheet 1
 Node ID=15
 Rows=193
 Columns=206
 Data Set Name=My TPED Dataset
)
>>> 
ghi.importText(fileName, dataSetName, **kws)

Import a text based file which uses any one-character delimiter

Parameters:

fileName : string

Path and file name of the text file

dataSetName : string

Name of the dataset created from the imported file

rowLabelColumn : integer, > 0, optional

Specifies column for row labels. If parameter is not specified then generic labels are created.

delimiter : string (one of below), optional

  • ‘,’: comma-delimited (default)
  • ‘ ‘: space-delimited
  • ‘ ‘: tab-delimited
  • any other one character delimiter string

missingEncoding : string, default list: period, comma, ‘?’, - , and ---, optional

Specifies a list of potential missing representations

alleleDelimiter : string, default: ‘_’, optional

Specifies genotypic allele delimiter, restricted to one character in length

readGenetic : bool, default: True, optional

Indicates whether or not all allele delimited data is genotypic instead of categorical

emptyAsGenotypic : bool, default: True, optional

Indicates whether columns full of empty or missing data should be encoded as genotypic

skipNumLines : integer, default: 0, optional

Specifies number of header rows to skip

stopReadSentinel : string, optional

Stop reading the input file when this string is encountered as the first field in the file.

baseNumericType : integer (one of below), optional

realsAsFloats : bool, default: False, optional

Indicates whether to encode Real value types as single precision floating point numbers rather than the default of double precision

Returns:

spreadsheet : Spreadsheet

See also

Text File

Examples

>>> mySS = ghi.importText('d:/Example Data/HM_Pheno.csv', 'My Data', 
                          rowLabelColumn = 1, delimiter = ',', 
                          missingEncoding = '?', readGenetic = 1, 
                          alleleDelimiter = '_', skipNumLines = 0)
>>> 
>>> mySS
spreadsheet(
 Name=My Data - Sheet 1
 Node ID=37
 Rows=270
 Columns=23
 Data Set Name=My Data
)
>>> 
ghi.importTextPedigree(fileName, dataSetName, **kws)

Imports a text pedigree file (in either CSV or TXT format).

Parameters:

fileName : string

Path and file name of the text file

dataSetName : string

Name to use as the dataset name in the project

rowLabelColumn : integer, > 0, optional

Specifies field for row labels. If the parameter is not specified then generic labels are created.

delimiter : string (one of below), optional

  • ‘,’: comma-delimited (default)
  • ‘ ‘: space-delimited
  • ‘ ‘: tab-delimited
  • any other one character delimiter string

missingEncoding : string, default: ‘?’, optional

Specifies the missing phenotype encoding, can be any one character string.

alleleDelimiter : string, default: ‘_’, optional

Specifies genotypic allele delimiter, restricted to one character in length.

readGenetic : bool, default: True, optional

Indicates whether or not all allele delimited data is genotypic instead of categorical.

emptyAsGenotypic : bool, default: True, optional

Indicates whether columns full of empty or missing data should be encoded as genotypic

skipNumLines : integer, default: 0, optional

Specifies number of header rows to skip.

stopReadSentinel : string, optional

Stop reading the input file when this string is encountered as the first field in the file.

sexEncoding : string, default: ‘012’, optional

Specifies how the Sex field is encoded
  • ‘?01’: Missing = ?, Male = 0, Female = 1
  • ‘012’: Missing = 0, Male = 1, Female = 2

affStatusEncoding : string, default: ‘012’, optional

Specifies how the Affection Status field is encoded, if a binary field
  • ‘?01’: Missing = ?, Control = 0, Case = 1, or import as numeric field
  • ‘012’: Missing = 0, Control = 1, Case = 2
Returns:

spreadsheet : Spreadsheet

Spreadsheet where the first six columns contain pedigree information.

Examples

>>> mySS = ghi.importTextPedigree('/Example Data/data.txt', 'My Data', 
                                  delimiter = ' ',missingEncoding = '?', 
                                  readGenetic = 1,alleleDelimiter = '_', 
                                  skipNumLines = 1)
>>> 
>>> mySS
spreadsheet(
 Name=My Dataset - Sheet 1
 Node ID=52
 Rows=1170
 Columns=7
 Data Set Name=My Dataset
)
>>> 
ghi.importTextPhenotype(fileName, dataSetName, **kws)

This command allows you to import a text phenotype file (in either CSV or TXT format).

Parameters:

fileName : string

Path and file name of the text file

dataSetName : string

Name to use as the dataset name in the project

rowLabelColumn : integer, > 0, optional

Specifies field for row labels. If the parameter is not specified then labels are created based on the first two fields in the file.

delimiter : string (one of below), optional

  • ‘,’: comma-delimited (default)
  • ‘ ‘: space-delimited
  • ‘ ‘: tab-delimited
  • any other one character delimiter string

missingEncoding : string, default: ‘?’, optional

Specifies the missing phenotype encoding, can be any one character string.

alleleDelimiter : string, default: ‘_’, optional

Specifies genotypic allele delimiter, restricted to one character in length.

readGenetic : bool, default: True, optional

Indicates whether or not all allele delimited data is genotypic instead of categorical.

emptyAsGenotypic : bool, default: True, optional

Indicates whether columns full of empty or missing data should be encoded as genotypic

skipNumLines : integer, default: 0, optional

Specifies number of header rows to skip.

stopReadSentinel : string, optional

Stop reading the input file when this string is encountered as the first field in the file.

baseNumericType : integer (one of below), optional

realsAsFloats : bool, default: False, optional

Indicates whether to encode Real value types as single precision floating point numbers rather than the default of double precision

Returns:

spreadsheet : Spreadsheet

Spreadsheet (which typically has row labels based on family and patient IDs).

Examples

>>> mySS = ghi.importTextPhenotype("/Example Data/phenotype.txt", 
                                   "Phenotype Data", delimiter = ",", 
                                   missingEncoding = ".")
>>> 
>>> mySS
spreadsheet(
 Name=Phenotype Data - Sheet 1
 Node ID=55
 Rows=403
 Columns=8
 Data Set Name=Phenotype Data
)
>>> 
ghi.importIlluminaFinalReport(fileNameList, baseDataSetName, **kws)

This command allows you to import one or more Illumina final report text files.

Parameters:

fileNameList : list of strings

A list of path and file name for each input file

baseDataSetName : string

Name to use as the dataset name prefix in the project ‘*’ can be specified. The * is replaced with the name of the first input file.

delimiterScan : bool, default: True, optional

Indicates whether each file should be scanned to determine the field delimiter automatically.

delimiterList : list of strings, optional

May be used to specify override delimiters (only the first character of each string is used). A single character delimiter should be specified for each input file. If a single character is specified instead of a list, it is used as the delimiter for all input files. * ‘’: no change from automatic choice (null character) * ‘,’: comma-delimited * ‘ ‘: tab-delimited * any other one character delimiter string

fieldTypeScan : bool, default: True, optional

Indicates whether each file should be scanned to determine the data types for its fields automatically. Some speed improvement may be gained by turning this option off, but all non-genotypic data will be imported as categorical as a result.

fieldMap : list of dicts, optional

May be used to specify expected field mapping overrides. Expected fields are automatically determined in many cases, but the automatic selection may fail in some cases. A dict should be specified for each input file. If a single dict is specified instead of a list, it is used as the mapping for all input files. Each dict should include one or more of the following keys which correspond to expected fields. The value should be the name of the field to map as the expected field. * ‘Chr’: The chromosome field, must be mapped for marker map generation * ‘Pos’: The position field, must be mapped for marker map generation * ‘Snp’: The SNP name field, must be mapped if Chr & Pos are not, for use as column name * ‘Sample’: The sample name field, used for sample and row name determination * ‘GC’: The GC score field, must be mapped for GC score filtering to succeed An empty value may be specified for any key to specifically unmap it. In the case of ‘Sample’ this will result in the file names being used as the sample names.

filterGCScore : bool, default: False, optional

Indicates whether or input data with GC score below the minimumGCScore should be filtered out (not imported).

minimumGCScore : real, default: 0.15, optional

Specifies the minimum GC Score input data must have in order to be imported.

outputSSList : list of strings, optional

Specifies the list of input field names to import as spreadsheets. The default list will include any field starting with ‘Allele’. Such fields are treated specially and combined into genotypic data.

outputMMList : list of strings, optional

Specifies the list of input field names to import as marker map fields.

nodeLogEnabled : bool, default: True, optional

Indicates whether node logs should be created for output spreadsheets. The node logs summarize all options set at the time of import for later reference.

Returns:

spreadsheet : Spreadsheet

First spreadsheet specified in the outputSSList.

Examples

>>> mySS = ghi.importIlluminaFinalReport(['C:\Data\Sample-A.txt',
                                          'C:\Data\Sample-B.txt'],
                                          'My Dataset')
>>> 
>>> mySS
spreadsheet(
 Name=My Dataset - Alleles - Top - Sheet 1
 Node ID=11
 Rows=2
 Columns=54020
 Data Set Name=My Dataset - Alleles - Top
)
>>> 

Genome Browser Operations

ghi.dataFetchListing(category)

This command fetches listings from our data repository of optional resources. The category parameter is optional.

When category is not set, the list of valid available categories is returned. If a category is provided, a list is returned where each entry is a list of information for an available file in that category.

Parameters:

category : string (one of below), optional

The category types include, but are not limited to:
  • ‘RefData’: Reference Data
  • ‘ExampleProjects’: Projects used in tutorials or other example projects
  • ‘MarkerMaps’: GHI curated marker maps
  • ‘PublicData’: GHI curated Public Datasets
  • ‘AffyLibraryFiles’: GHI hosted Affymetrix CDF library files
Returns:

dataListing : list of items in the data repository or a list of categories

Examples

>>> categories = ghi.dataFetchListing()
>>> 
>>> categories
[u'RefData', u'AffyLibraryFiles', u'GenomeAssemblies', 
 u'Annotations', u'MarkerMaps', u'ExampleProjects', u'PublicData', 
 u'Assemblies']
>>> dataListing = ghi.dataFetchListing('ExampleProjects')
>>>
>>> dataListing[0]
[u'http://data.goldenhelix.com/data/SVS/ExampleProjects/Variant%20Cl
assification%20-%20Complete.zip', u'Variant Classification - Complete',
u'46253804', u'46253804', u'11/14/2011', u'application/zip']
>>> 
ghi.dataFetchUrls(category, queryFileName)

Fetch a URL and MIME type for a specified file.

Parameters:

category : string (one of below)

The category types include, but are not limited to:
  • ‘RefData’: Reference Data
  • ‘ExampleProjects’: Projects used in tutorials or other example projects
  • ‘MarkerMaps’: GHI curated marker maps
  • ‘PublicData’: GHI curated Public Datasets
  • ‘AffyLibraryFiles’: GHI hosted Affymetrix CDF library files

queryFileName : string

Search the data repository in the specified category for files that match the query. The query can contain a wile card prefix or postfix to potentially return a list of matching files.

Returns:

strings : list of pairs of URL and MIME type

Examples

>>> urlList = ghi.dataFetchUrls('MarkerMaps','Affy')
>>> 
ghi.dataAutoDownload(category, fileName, promptOverride, manualPath)

Auto download data from the data repository

This command looks up a file on our data repository server (see ghi.dataFetchUrl) and then downloads that to the application data folder if it is not already downloaded.

Parameters:

category : string (one of below)

The category types include, but are not limited to:
  • ‘RefData’: Reference Data
  • ‘ExampleProjects’: Projects used in tutorials or other example projects
  • ‘MarkerMaps’: GHI curated marker maps
  • ‘PublicData’: GHI curated Public Datasets
  • ‘AffyLibraryFiles’: GHI hosted Affymetrix CDF library files

fileName : string

Name of the file to download from the repository

promptOverride : bool, default: False, optional

Whether or not to prompt for the overwrite action the file if it already exists

manualPath : string, optional

If specified the data is downloaded to the specified path instead of the assigned path for the category type.

Returns:

filePath : string

File path of the downloaded data file.

Examples

>>> filePath = ghi.dataAutoDownload('RefData',
                                    'AffySNP6_YRI_SNP_ABRefs.dsf',
                                    True, 'd:/Tmp')
>>> 
>>> filePath
u'd:/Tmp/AffySNP6_YRI_SNP_ABRefs.dsf'
>>> 

Initialize Builder Operations

ghi.markerMapBuilder(markerMapName, *args)

A marker map can either be built “by markers” or “by fields”. The specifications for the function depend on how the map is to be built.

The preferred method of building marker maps is with the “by markers” parameters. The “by markers” option allows adding data to the marker map as it becomes known while markers are processed. To build a map by markers, pass the required parameters of typeList and fieldList into this function. Afterward, add the marker map data for each marker to the map individually.

markerMapBuilder(markerMapName, typeList, fieldList)

The alternate method is to build a map “by fields”. To use this method, data for any field for all of the markers must have been assembled into a list before that field can be added to the map. To build a map by fields, the required parameters of markers, chromosomes, and positions should be passed into this function, after which additional fields may be added one by one to the marker map.

markerMapBuilder(markerMapName, markerNames, chromosomes, positions)

Parameters:

markerMapName : string

A name for the marker map and for the DSM file.

By *markers* :

typeList : list of data types, (one or more from the following list)

List of data types for extra fields. Valid types include ghi.const.TypeInteger, ghi.const.TypeReal and ghi.const.TypeCategorical

fieldList : list of strings

List of the field names for the extra fields. Should be the same length as typeList

By *fields* :

markers : list of strings

A list of marker names

chromosomes : list of strings

A list of chromosome numbers and/or names as strings

positions : list of integers

A list of positions as integers

Returns:

mapBuilder : marker map builder object

mapBuilder will either be a by marker or a by field builder

Examples

>>> myByMarkerBuilder = ghi.markerMapBuilder("By Marker Map Builder",
                                             [ghi.const.TypeInteger],
                                             ["ProbeCount"])
>>> myByMarkerBuilder
marker_map_builder(
Name = My By Marker Map Builder
# Fields = 1 # Current Markers = 0
)
>>> myByMarkerBuilder.addMarker("my","1",1, [2])
True
>>> myByMarkerBuilder.finish()
u'~/AppData/Local/Golden Helix SVS/MarkerMaps/By Marker Map Builder.dsm'
>>> 
>>> 
>>> myByFieldBuilder = ghi.markerMapBuilder("By Field Map Builder",
                                            ['m1','m2','m3','m4','m5'],
                                            ['1','1','2','2','X'],
                                            [1,2,3,4,5])
>>> myByFieldBuilder
marker_map_builder(
Name = My By Field Map Builder
 # Markers = 5
)
>>> myByFieldBuilder.addIntField("Int field name",[1,1,1,2,2])
True
>>> myByFieldBuilder.finish(forProjectUseOnly=True)
u'~/AppData/Local/Golden Helix SVS/MarkerMaps/By Field Map Builder.dsm'
>>> 
>>> someNewlyBuiltSpreadsheet.setMarkerMap(myByFieldBuilder,
                                           columnOriented=True)
>>> 
ghi.dataSetBuilder(dataSetName, numRows)

Returns an object for use in building new datasets.

Parameters:

dataSetName : string

Name for the dataset name in the project

numRows : integer, > 0

Number of rows in the new dataset

Returns:

datasetBuilder : DataSetBuilder Object

Examples

>>> myBuilder = ghi.dataSetBuilder('Recoded Genotype Spreadsheet', 500)
>>> myBuilder
builder(
 Name = Recoded Genotype Spreadsheet
 Rows = 500
 Cols = 0
)
>>> 

Progress and Status Dialog Operations

ghi.progressDialog(caption, numSteps, allowCancel)

Create a progress dialog.

Parameters:

caption : string

A caption for the progress dialog

numSteps : integer

The number of incremental steps to set the progress

allowCancel : bool, default: True

Whether or not to allow the progress bar to be canceled

Returns:

progressDialog :ProgressDialog object :

Examples

>>> myProgress = ghi.progressDialog("Scanning sheet", ss.numCols(),1)
>>> 
ghi.statusDialog(caption, allowCancel)

Create a status dialog.

Parameters:

caption : string

A caption for the status dialog

allowCancel : bool, default: False

Whether or not the status dialog can be canceled

Returns:

statusDialog : ProgressDialog Object

Examples

>>> myStatus = ghi.statusDialog("Opening File...", True)
>>> 

Prompt Operations

ghi.question(questionString)

This command allows the user to answer a yes or no question.

Parameters:

questionString: string :

Text to put in prompt.

Returns:

bool : Yes or No as answered by the user.

ghi.promptInteger(description, min, max)

This command displays a dialog box and allows a user to supply an integer value.

Parameters:

description : string

Instructions for the user

min : integer

Minimum value for a list box, if not specified there is no minimum value

max : integer

Maximum value for a list box, if not specified there is no maximum value

Returns:

integerValue : integer

Integer specified by the user

Examples

>>> myInt = ghi.promptInteger("Choose a number between 0 and 10:",
                              min=0,max=10)
>>> myInt
5
>>> 
ghi.promptSpreadsheet(prompt, requirements)

This command displays a dialog allowing the user to select a spreadsheet from the Project Navigator Window.

Parameters:

prompt : string, optional

A string specifying the text displayed in the spreadsheet chooser

requirements : list of integers, one or more from the following list, optional

A list of integers specifying the requirements for valid spreadsheets.
Returns:

spreadsheet : Spreadsheet Object

Examples

>>> mySS = ghi.promptSpreadsheet("Choose a spreadsheet",
                                 [ghi.const.ContainsBinary, 
                                  ghi.const.ContainsMappedReal])
>>>     
ghi.promptUser(items)

This command is a generalized prompter for information from the user.

Deprecated since version 7.4.0: Use promptDialog() instead

Parameters:items : list
Returns:list : list of values
ghi.promptDialog(items, scrollableLayout, width, height, title, okText)

This command is a generalized prompter for information from the user.

See the Help Manual for more information and examples.

Parameters:

items : list

scrollableLayout : bool

width : int

height : int

title : string

okText : string

Returns:

dictionary : map of specified values

Dictionary of values for each input prompt

Script Requirements

ghi.requireVersion(version)

Requires that the current version of the software is equal to or newer than the provided version. The version string can be just a major and minor number such as ‘7.4’ or a major, minor and bugfix number such as ‘7.4.1’.

Parameters:

version : string

Minimum software version number

Creating Custom Nodes

ghi.createScriptNode(displayName, scriptText, scriptIcon, parentNodeId, scriptExt)

Create a node that executes a python script when opened

Parameters:

displayName : string

Name of the new node

scriptText : string

The full text of the python script that will be executed when opened.

scriptIcon : string, default: python, optional

One of the following icon names:
  • histogram
  • scatter
  • ld
  • heatmap
  • variant
  • pie
  • plot
  • camera
  • regression

parentNodeId : integer, default: None, optional

When set, create this node with the given parent. Otherwise top-level.

scriptExt: string, optional :

Should always be “py”. For future expandability

Returns:

scriptnode : ScriptNode Object

ghi.createResultViewer(displayName, **kws)

Create a custom Result Viewer node

When textString or htmlString are not passed in, you can build up the ResultViewer iteratively with its commands such as addHeader, addPairedList etc.

Otherwise this can display an existing text string as monospaced or an existing HTML string in rendered form.

Parameters:

displayName : string

Name of the new node

textString : string, default: None, optional

When set, this node will display plain text (monospaced)

htmlString : string, default: None, optional

When set, this node will display in HTML format

parentNodeId : integer, default: None, optional

When set, create this node with the given parent. Otherwise top-level.

Returns:

resultviewer : ResultViewer Object

Examples

>>> rv = ghi.createResultViewer("Custom Result Viewer")
>>> rv.addHeader("Custom Results")
>>> rv.addSubHeader("Some Key/Value Pairs")
>>> rv.addPairedList([["Some Stat",0.9999],["Some Param","Param Value"],
                      ["Count",10]])
>>> rv.addSubHeader("A Table")
>>> rv.addTable(["Labels","Stat1","Stat2"],[["Row 1",0.25,10],
                ["Row 2",0.75,11],["Row 3",0.99,-5]])
>>> rv.addSubHeader("A Labeled List")
>>> rv.addList("Detected Chromosomes",["1","2","X","Y"])
>>> rv.show()
>>> 
>>> rv2 = ghi.createResultViewer("Monospaced Text", 
              textString="testing\nOne\tTwo\n1.0\t2.0\n")
>>> rv2.show()
>>> 
>>> rv3 = ghi.createResultViewer("HTML Text", 
              htmlString="<h3>A Header</h3><p>Testing a paragraph</p>")
>>> rv3.show()

Genome Assembly Operations

ghi.genomeAssembly(coordSysId)

Returns a dictionary representation of a genome assembly.

Parameters:

coordSysId : string

Genome identifier, usually as defined by http://www.dasregistry.org i.e. ‘GRCh_37,Chromosome,Homo sapiens’

Returns:

genomeMap : dict of assembly data

Examples

>>> ghi.genomeAssembly('GRCh_37,Chromosome,Homo sapiens')
{u'modified': u'2013-01-01T00:00:00', 
 u'coordinates': u'GRCh_37,Chromosome,Homo sapiens', 
 u'taxId': u'9606', u'genBankId': u'GCA_000001405.1', 
 u'build': u'GRCh37 hg19', 
 u'date': u'2009-02-27', 
 u'refSeqId': u'GCF_000001405.13', 
 u'segment': [
   {u'length': 249250621, u'type': u'autosome', u'name': [u'1']}, 
   {u'length': 243199373, u'type': u'autosome', u'name': [u'2']}, 
   {u'length': 198022430, u'type': u'autosome', u'name': [u'3']}, 
   {u'length': 191154276, u'type': u'autosome', u'name': [u'4']}, 
   {u'length': 180915260, u'type': u'autosome', u'name': [u'5']}, 
   {u'length': 171115067, u'type': u'autosome', u'name': [u'6']}, 
   {u'length': 159138663, u'type': u'autosome', u'name': [u'7']}, 
   {u'length': 146364022, u'type': u'autosome', u'name': [u'8']}, 
   {u'length': 141213431, u'type': u'autosome', u'name': [u'9']}, 
   {u'length': 135534747, u'type': u'autosome', u'name': [u'10']}, 
   {u'length': 135006516, u'type': u'autosome', u'name': [u'11']}, 
   {u'length': 133851895, u'type': u'autosome', u'name': [u'12']}, 
   {u'length': 115169878, u'type': u'autosome', u'name': [u'13']}, 
   {u'length': 107349540, u'type': u'autosome', u'name': [u'14']}, 
   {u'length': 102531392, u'type': u'autosome', u'name': [u'15']}, 
   {u'length': 90354753, u'type': u'autosome', u'name': [u'16']}, 
   {u'length': 81195210, u'type': u'autosome', u'name': [u'17']}, 
   {u'length': 78077248, u'type': u'autosome', u'name': [u'18']}, 
   {u'length': 59128983, u'type': u'autosome', u'name': [u'19']}, 
   {u'length': 63025520, u'type': u'autosome', u'name': [u'20']}, 
   {u'length': 48129895, u'type': u'autosome', u'name': [u'21']}, 
   {u'length': 51304566, u'type': u'autosome', u'name': [u'22']}, 
   {u'length': 155270560, u'type': u'allosome', u'name': [u'X']}, 
   {u'length': 59373566, u'type': u'allosome', u'name': [u'Y']}, 
   {u'visible': u'data', u'length': 16571, u'type': u'mitochondrial', 
    u'name': [u'M', u'MT']}, 
   {u'visible': u'data', u'length': 155270560, u'type': u'allosome', 
    u'name': [u'XY']}], u'common': [u'Human', u'Man']}
 >>> 

Named Integer Constants

This module consists of named integer constants of the form ghi.const.*, where the star depends on the api the integer is used for.

AppProperties

ghi.const.AppPath
ghi.const.ProjectPath
ghi.const.MarkerMapPath
ghi.const.MarkerMapsPath
ghi.const.UserScriptsPath
ghi.const.DataPath
ghi.const.CommonDataPath
ghi.const.AnnotationsPath
ghi.const.AssembliesPath
ghi.const.SysTempPath
ghi.const.ProjectTempPath
ghi.const.DefaultGenomeMap
ghi.const.SysDataPath
ghi.const.SysGenomeMapPath
ghi.const.Version

File Chooser Type

For ghi.chooseFile mode: FileChooserType

ghi.const.ChooseSingleFile
ghi.const.ChooseMultipleFiles
ghi.const.ChooseSaveAs

Import Header Mode

For ghi.importData columnHeaderMode: ImportHeaderMode

ghi.const.HeaderAutoDetect
ghi.const.HeaderFirstRow
ghi.const.HeaderGenerate

For ComputeLD API

For ss.computeLD

For ss.computeLD Method

Haplotype imputation method:

ghi.const.ImputeCHM
ghi.const.ImputeEM
ghi.const.StatRSquared
ghi.const.StatDPrime
ghi.const.SegmentUnivariate
ghi.const.SegmentMultivariate
ghi.const.SegmentMaxPer10k
ghi.const.SegmentMaxPerChr
ghi.const.SegmentOuputFirstColumn
ghi.const.SegmentOuputEveryColumn
ghi.const.PcaUncentered
ghi.const.PcaCenteredByMarker
ghi.const.PcaCenteredBySample
ghi.const.PcaCenteredByMarkerAndSample
ghi.const.ModelAllelic
ghi.const.ModelGenotypic
ghi.const.ModelAdditive
ghi.const.ModelDominant
ghi.const.ModelRecessive
ghi.const.PcaNormDefault
ghi.const.PcaNormActual
ghi.const.PcaNormNone
ghi.const.PcaModelAdditive
ghi.const.PcaModelDominant
ghi.const.PcaModelRecessive
ghi.const.StateInactive
ghi.const.StateActive
ghi.const.StateDependent
ghi.const.TypeBinary
ghi.const.TypeInteger
ghi.const.TypeReal
ghi.const.TypeCategorical
ghi.const.TypeGenotypic
ghi.const.HapBlockPrecomputed
ghi.const.HapBlockAllMarkers
ghi.const.HapBlockMovingWindow
ghi.const.WindowFixed
ghi.const.WindowDynamic
ghi.const.TestsPerHaplotype
ghi.const.TestsPerBlock
ghi.const.HapImputeEM
ghi.const.HapImputeCHM
ghi.const.BlockDetectGabriel
ghi.const.Confidence90
ghi.const.Confidence95
ghi.const.Confidence99
ghi.const.DupKeepBoth
ghi.const.DupFillLeft
ghi.const.DupFillRight
ghi.const.RegressPerColumn
ghi.const.RegressMovingWindow
ghi.const.RegressSelected
ghi.const.ThresholdMainP
ghi.const.ThresholdLogMainP
ghi.const.ThresholdFullP
ghi.const.ThresholdLogFullModelP
ghi.const.ThresholdRSquared
ghi.const.StepwiseBackward
ghi.const.StepwiseForward
ghi.const.DataTypeBinary
ghi.const.DataTypeInteger
ghi.const.DataTypeFloat
ghi.const.DataTypeDouble
ghi.const.DataTypeCategorical
ghi.const.DataTypeGenotypic
ghi.const.CasesAndControls
ghi.const.ControlsOnly
ghi.const.CasesOnly
ghi.const.LessThan
ghi.const.LessOrEqual
ghi.const.GreaterThan
ghi.const.GreaterOrEqual
ghi.const.IsPedigree
ghi.const.ContainsGenetic
ghi.const.ContainsMappedGenetic
ghi.const.ContainsMappedReal
ghi.const.IsPrecomputedPca
ghi.const.ContainsBinary
ghi.const.ContainsInteger
ghi.const.MapOrientationNone
ghi.const.MapOrientationColumns
ghi.const.MapOrientationRows
ghi.const.NoGuiElements
ghi.const.AllGuiElements
ghi.const.ProgressOnly
ghi.const.ReferenceAll
ghi.const.ReferencePopulation
ghi.const.ReferenceSubset
ghi.const.OrientationVertical
ghi.const.OrientationHorizontal
ghi.const.FilterActiveOnly
ghi.const.FilterDependent
ghi.const.FilterActive
ghi.const.FilterMapped
ghi.const.FilterBinary
ghi.const.FilterGenotypic
ghi.const.FilterCategorical
ghi.const.FilterInt
ghi.const.FilterReal
ghi.const.FilterQuantitative
ghi.const.UnsortedRows
ghi.const.ReadModeOverlap
ghi.const.ReadModeExact
ghi.const.IntervalModeIndexed
ghi.const.IntervalModeHalfOpen
ghi.const.ArrayModeNative
ghi.const.ArrayModeTyped
ghi.const.FieldIndexChr
ghi.const.FieldIndexStart
ghi.const.FieldIndexStop
ghi.const.FieldIndexId
ghi.const.TagYellow
ghi.const.TagRed
ghi.const.TagBlue
ghi.const.TagGreen
ghi.const.TagNone

Progress Dialog Module

This module contains functions that act on progress dialogs.

progressdialog.finish()

Finishes and closes the current progress or status dialog.

progressdialog.hide()

Closes or hides the progress or status dialog.

progressdialog.isHidden()

Checks to see if the progress or status dialog is hidden.

Returns:

hidden : bool

True if the dialog is hidden, False if otherwise

progressdialog.reset(newMessage, andClose)

This command updates an existing progress dialog.

Parameters:

newMessage : string

A new message to display in the progress dialog

andClose : bool

Indicates whether the old dialog should be closed

  • 0 do not close the old dialog
  • 1 (default) close the old dialog
progressdialog.setDoubleProgressMode(autoIncrement, numSteps, newLabel, showCancelOption, keepStepHistory)

This command updates an existing progress dialog to add an additional progress bar for a sub-operation.

The parameters for this command are detailed below: * autoIncrement: * numSteps: * newLabel: * showCancelOption: * keepStepHistory:

Parameters:

autoIncrement : bool

Indicates whether the progress bar should auto increment

numSteps : int

Set the number of steps for the secondary progress bar

newLabel : string

Specifies a new label string for the secondary progress bar

showCancelOption : bool

Allows the process to be canceled
  • False do not allow cancel
  • True (default) allow cancel

keepStepHistory : bool

Indicates whether the step history should be kept
  • False do not keep the step history
  • True (default) keep the step history
progressdialog.setMessage(message)

This command allows you to update the message in the progress or status dialog without having to reset the counter.

Parameters:message : string
progressdialog.setMinimumDuration(ms)

Sets the amount of time the progress dialog should wait before displaying for the specified number of milliseconds.

Parameters:

ms : int

The number of milliseconds before showing the dialog

progressdialog.setMode(mode)

This command indicates whether the progress dialog is used for a single- or multi-step process.

Parameters:

mode : int

The mode of the progress dialog:

  • 0: Single-step process
  • 1: Multi-step process
progressdialog.setProgress(progress)

This command advances the progress using the given integer value.

Parameters:

progress : int

The value must be between 0 and the total number of progress steps.

progressdialog.setProgressMode(numSteps, newLabel, showCancelOption, keepStepHistory)

This command updates the progress dialog to keep the dialog active for longer than was originally set.

Parameters:

numSteps : int

Set the number of steps for the updated progress dialog

newLabel : string

Specifies a new label string for the updated progress dialog

showCancelOption : bool

Allows the process to be canceled

  • False do not allow cancel
  • True (default) allow cancel

keepStepHistory : bool

Indicates whether the step history should be kept

  • False do not keep the step history
  • True (default) keep the step history
progressdialog.setSecondaryMessage(message)

This command sets the message for the secondary progress bar.

Parameters:

message : string

The message to display for the secondary progress.

progressdialog.setSecondaryProgress(progress)

This command advances the secondary progress using the given integer value.

Parameters:

progress : int

The value must be between 0 and the total number of secondary progress steps.

progressdialog.setSecondaryTotalSteps(totalSteps)

This command sets the total number of secondary steps for use in the secondary progress bar.

Parameters:totalSteps : int
progressdialog.setStatusMode(newStatus, showCancelOption, keepStepHistory)

This command renews the status dialog to keep the status dialog active with a new message and options.

Parameters:

newStatus : string

Specify a new label string for the status dialog

showCancelOption : bool

Allows the process to be canceled
  • False do not allow cancel
  • True (default) allow cancel

keepStepHistory : bool

Indicates whether the step history should be kept
  • False do not keep the step history
  • True (default) keep the step history
progressdialog.show()

Show the progress or status dialog, or bring the dialog to the front.

progressdialog.showCancel(cancel)

Either shows or removes a cancel button from the progress or status dialog.

Parameters:

cancel : bool

  • False hide cancel button on dialog
  • True show cancel button on dialog
progressdialog.value()

Returns the last integer used to update the progress dialog.

Returns:value : int
progressdialog.wasCanceled()

This command checks to see if the progress or status dialog was canceled.

Returns:

wasCanceled : bool

  • True if the dialog was canceled
  • False if the dialog is still active

Spreadsheet Module

This module contains functions that act on a spreadsheet object. The commands listed under Tabular Module and Navigator Node Module will also act on a spreadsheet object.

spreadsheet.activeSubset()

Creates a new spreadsheet object with all of the active data of the current spreadsheet object.

If all of the data is active, then an error message will be displayed indicating that some of data needs to be inactivated in order to create an active subset spreadsheet.

Returns:spreadsheet : Spreadsheet Object

Examples

>>> myNewSS = mySS.activeSubset()
>>> 
spreadsheet.activateByChromosome(activeChr)

This function activates marker mapped rows/columns for which the listed chromosome is contained in the activeChr list. Marker mapped columns which have chromosomes that are not contained in the activeChr list will be inactivated.

Parameters:

activeChr : list of strings

List of chromosomes which should be used to activate rows/columns

Examples

>>> mySS.activateByChromosome(['1','2','X'])
>>> 
spreadsheet.appendSpreadsheet(nodeId, datasetName, dropColumns, addToProjectRoot, caseSensitive)

This command appends a spreadsheet to the current spreadsheet object.

Parameters:

nodeId : integer

Navigator node id of the spreadsheet to append

datasetName : string

New dataset name for the appended spreadsheet

dropColumns : bool, default: True, optional

Whether or not to drop non-matching columns. If columns are kept, then the empty cells are filled with missing, ‘?’.

addToProjectRoot : bool, default: False, optional

Whether or not a new spreadsheet node is created as a child of the project root or not. If not created as a child of the project root, the spreadsheet is created as a child of the current spreadsheet.

caseSensitive : bool, default: True, optional

Whether to match the column headers case sensitive or case insensitive.

Returns:

spreadsheet : Spreadsheet Object

Examples

>>> ss = ghi.getCurrentObject()
>>> ss.appendSpreadsheet(4,"My SS appended with SS Node 4", 
                         dropColumns = 0, addToProjectRoot = 1, 
                         caseSensitive = False)
spreadsheet(
 Name=My SS appended with SS Node 4 - Sheet 1
 Node ID=123
 Rows=6000
 Columns=99
 Data Set Name=My SS appended with SS Node 4
)
spreadsheet.applyMarkerMap(mmFileName, columnOriented, dropDuplicates)

This command applies a genetic marker map from the marker maps folder to the current spreadsheet object.

Parameters:

mmFileName : string

Marker map DSM file name in marker maps folder

columnOriented : bool, default: True, optional

Whether or not the marker names are column headers. If this is false then the marker names are row labels.

  • 0 = marker names are row labels
  • 1 = marker names are column headers

dropDuplicates : bool, default: False, optional

Whether or not to just apply the map to the first instance of a marker name.

  • 0 = apply the map to all markers including duplicates
  • 1 = apply the map to only the first of each marker name; delete all other columns or rows for that marker name
Returns:

spreadsheet : Spreadsheet Object

Examples

>>> myMarkerMappedSS = mySS.applyMarkerMap("myMarkerMap.dsm", 
                                           columnOriented = 1,
                                           dropDuplicates = 0)
>>> 
spreadsheet.bayesCMethods(**kws)

This command runs the Bayes C or Bayes C-pi genomic prediction method using the current spreadsheet.

Parameters:

initialPi: double, >= 0.0, default: 0.5 :

Either the initial value of pi for Bayes C-pi, or the fixed value of pi for Bayes C.

treatMissingAsHomo: bool, default: True :

If true, missing genotypes will be imputed as homozygous major. If false, missing genoytpes will be imputed as the average of all the non-missing genoytpes for that marker.

correctForGender: bool, default: False :

If true, gender correction will be used to compute different ASE values for males and females. A column encoding the sex of each sample and the hemizygous chromosome must be specified.

sexColumn: integer, default: -2 :

The column that encodes the sex for each sample. Must be included for gender correction.

hemiChrome: String, default: “X” :

The chromosome that is hemizygous for males. Must be included for gender correction.

useKinship: bool, default: False :

Whether or not to include a pre-computed kinship matrix. The node ID of the kinship matrix must be included.

kinshipMatrix: integer :

The node ID of the pre-computed kinship matrix.

addCovariates: bool, default: False :

Whether or not to use fixed effects in this bayesian analysis. The column numbers of the fixed effects must be included.

covariates: python integer list :

The list of column numbers for the fixed effects to be included in this analysis.

dropMissingPheno: bool, default: True :

If true, samples with missing phenotypes will be excluded from the analysis. If false, missing phenotypes will be predicted and their genotypes will be included in the analysis.

numIterations: integer, > 0, default: 50000 :

The number of iterations for the MCMC step.

numBurnIn: integer, >= 0, default: 0 :

The number of samples to be thrown out at the beginning of the MCMC interations.

numThin: integer, >= 0, defeault: 0 :

One every x samples will be kept, with x being numThin.

useBayesCPi: bool, default: True :

If true, the Bayes C-pi method will be used. If false, the Bayes C method will be used.

modifiedOutput: bool, default: False :

If true, spreadsheets containing just the ASE, fixed effect coefficients, and the predicted phenotypes will be outputted. This is usually used for cross validation.

foldNumber: integer, default: -1 :

The current fold number for K-Fold cross validation. Will used this value on the output spreadsheets if modifiedOutput is chosen.

centerGenotypes: bool, default: False :

If true, the mean for each marker will be subtracted from each genotype. If false, genotype values will be left as is, 0, 1, or 2.

Returns:

Outputs: :

  • 0: Bayes (C/C Pi) estimates by marker
  • 1: Bayes (C/C Pi) estimates by sample
  • 2: Bayes (C/C Pi) Genomic Relationship Matrix (if no pre-computed matrix)
  • 3: Bayes (C/C Pi) Run Log
  • 4: Bayes (C/C Pi) Trace Spreadsheet
  • 5: Plot of Numeric Values from Bayes (C/C Pi) Trace Spreadsheet

if in modified mode, the following outputs are available: :

  • 0: Bayes (C/C Pi) Genomic Relationship Matrix (if no pre-computed matrix)
  • 1: Bayes (C/C Pi) estimates by marker - Fold #
  • 2: Bayes (C/C Pi) fixed effect coefficients - Fold #
  • 3: Bayes (C/C Pi) estimates by sample - Fold #
  • 4: Bayes (C/C Pi) Run Log - Fold #
  • 5: Bayes (C/C Pi) Trace Spreadsheet - Fold #
  • 6: Plot of Numeric Values from Bayes (C/C Pi) Trace Spreadsheet - Fold # error.

Examples

>>> mySSList = mySS.bayesCMethods(useBayesCPi = False)
>>>
spreadsheet.cell(rowNumber, colNumber)

This command obtains the value from the specified spreadsheet cell.

Parameters:

rowNumber : integer, >= 0

The row number for desired row, 0 = column name headers

colNumber : integer, >= 0

The column number for desired column, 0 = row labels

Returns:

cellValue : various types

Examples

>>> value = mySS.cell(1,3)
>>> value
'A_G'
>>> 
spreadsheet.cnamSegmentation(**kws)

This command performs CNAM optimal segmenting analysis on the current spreadsheet object and returns a list of objects containing the results of the segmentation procedure.

Parameters:

algorithm : integer, (one of below)

Whether a univariate or multivariate algorithm should be used.

useMovingWindow : bool, default: False

Whether or not a moving window should be used.

windowSize : integer, > 0, default: 20000

The size of the moving window to be used, requires useMovingWindow

maxSegmentsType : integer, (one of below)

Indicate how to specify the maximum number of segments.

maxSegmentsPer10k : integer, > 0, default: 10

The maximum number of expected segments for every 10,000 markers. Only used when maxSegmentsType is SegmentMaxPer10k.

maxSegmentsPerChr : integer, > 0, default: 100

The maximum number of expected segments for a chromosome/window. Only used when maxSegmentsType is SegmentMaxPerChr. If useMovingWindow == True then the maximum number of segments is per window instead of per chromosome.

minMarkers : integer, > 0, default: 1

The minimum number of markers needed to be considered a segment.

maxPairwisePVal : double, (0.0, 1.0], default: 0.005

The significance level for comparing adjacent segments. Set to 1.0 to skip permutation testing.

numThreads : integer, [1, 64], default: 2

The number of threads to use when computing optimal segments

outputFirstColumn : bool, default: True

Whether or not to create output for each marker.

outputEveryColumn : bool, default: False

Whether or not to create output for the first column of each segment only.

fullLogging : bool, default: False

Whether or not to produce a detailed output log

wiggleFile : string

A path to write a UCSC wiggle file to, if not specified, no wiggle file will be created.

removeOutliers : bool, default: True

Whether or not to remove univariate outliers

useHardwareAcceleration : bool, default: False

Whether or not to enable OpenCL accelerated segmentation

openCLDevice : integer, >= 1, default: 1

Index of the device to use

specifyMemoryLimit : bool, default: 0

Whether or not CNAM should use a user specified memory limit. Requires algorithm = ghi.const.SegmentMultivariate.

memoryLimit : integer, > 0, default: 64

Memory limit in MB. Requires algorithm = ghi.const.SegmentMultivariate and specifyMemoryLimit = True.

Returns:

spreadsheets : a list of spreadsheet objects of length 4 as follows

  • 0: segmentation covariates spreadsheet (every column)
  • 1: segmentation covariates spreadsheet (first column only)
  • 2: segment list spreadsheet
  • 3: segmentation run log viewer
spreadsheet.col(colNumber, state)

Returns all data except the column name header from the specified column (1-based index) from the current spreadsheet object in a list.

Parameters:

colNumber : integer, >= 0

The column number of desired column, 0 = row labels

state : integer, (one of below), optional

Returns:

column : list of single value type

List of data from a column that meets the specified row state parameter. By design columns will have only data with the same type in them.

See also

zcol, datamodel.col

Examples

>>> column = mySS.col(5, ghi.const.StateActive)
>>> column[0:5]
[62.385, 65.5234, ?, 76.868]
>>> 
spreadsheet.zcol(colNumber, state)

Returns all data except the column name header from the specified column (0-based index) from the current spreadsheet object in a list.

Parameters:

colNumber : integer, >= 0

The column number of desired column, 0 = first column of data

state : integer, (one of below), optional

Returns:

column : list of single value type

List of data from a column that meets the specified row state parameter. By design columns will have only data with the same type in them.

See also

col, datamodel.zcol

Examples

>>> column = mySS.zcol(4, ghi.const.StateActive)
>>> column[0:5]
[62.385, 65.5234, ?, 76.868]
>>> 
spreadsheet.colHeaders(state)

Returns a list of the column headers.

Parameters:

state : integer, (one of below)

Returns:

columnHeaders : list of column headers

List of column headers that meets the specified column state parameter.

Examples

>>> colHeadsDependent = mySS.colHeaders(ghi.const.StateDependent)
>>> colHeadsDependent
['Case/Control']
>>> colHeadsIndependent = mySS.colHeaders(ghi.const.StateActive)
>>> colHeadsIndependent[0:5]
['weight (lbs)', 'height (in)', 'age', 'Lab 1']
>>> 
spreadsheet.colIndexes(colType, state)

Return a list of 1-based column indexes for a given column type and state.

Parameters:

colType : integer, (one of below), optional

state: integer, (one of below), optional :

Returns:

colIndexes : list of column indexes

List of column indexes that meet the specified parameters.

Examples

>>> colIdxsAll = mySS.colIndexes()
>>> colIdxsBinary = mySS.colIndexes(ghi.const.TypeBinary)
>>> colIdxsActive = mySS.colIndexes(-1, ghi.const.StateActive)
>>> colIdxsBinaryActive = mySS.colIndexes(ghi.const.TypeBinary, 
                                          ghi.const.StateActive)
>>> 
spreadsheet.colState(column)

This command takes a column index as a parameter and returns the state of the specified column.

The possible column states are as follows:

Parameters:

column : integer, >= 1

Column number

Returns:

columnState : integer, (one of below)

Examples

>>> columnState = mySS.colState(5)
0
>>> 
spreadsheet.colSubset()

Creates a new spreadsheet object from the active columns of the current spreadsheet object.

Returns:

spreadsheet : Spreadsheet Object

Subset of the original spreadsheet created from all of the active columns

Examples

>>> subsetSS = mySS.colSubset()
>>> subsetSS
spreadsheet(
 Name=Phenotype + HM_500K_LogRs - Column Subset
 Node ID=295
 Rows=270
 Columns=22022
 Data Set Name=Phenotype + HM_500K_LogRs
)
>>> 
spreadsheet.createTopLevelSpreadsheet(datasetName, activeDataOnly)

This command takes the current spreadsheet object and creates a new dataset that is a child of the project root. The top level spreadsheet can be an exact copy or a subset based on active data only.

Parameters:

datasetName : string

New dataset name for the top level spreadsheet.

activeDataOnly : bool, default: True, optional

Whether or not only active data should be used to create the top level spreadsheet.

Returns:

spreadsheet : Spreadsheet Object

Top-level spreadsheet created from the original spreadsheet.

Examples

>>> myTopLevelSS = mySS.createTopLevelSpreadsheet("My New SS", 0)
>>> 
spreadsheet.dropMarkerMap()

This command creates a new spreadsheet where the current marker map is removed and returns a reference to that spreadsheet.

Returns:spreadsheet : Spreadsheet Object

Examples

>>> noMapSS = mySS.dropMarkerMap()
>>> 
spreadsheet.dataModel(flags)

Create a datamodel based on the filtering flags provided. A DataModel is a view of the spreadsheet that can look at just active data and data of specific types. This is useful for analysis where a method has constraints on what data it can handle.

Parameters:

flags : integer, (one or more of below)

The flags are a logical OR (‘|’) combination of the following constants.
Returns:

datamodel : DataModel Object

See also

Datamodel Module

Notes

The ghi.const.FilterMapped tag does not follow the logical OR but rather a logical AND. For example, (ghi.const.FilterMapped | ghi.const.FilterReal) would specify only Real columns that are mapped.

Examples

>>> myDM = mySS.dataModel(ghi.const.FilterBinary | ghi.const.FilterInt)
>>> 
spreadsheet.exportCSV(saveFileName, activeDataOnly)

Exports the current spreadsheet object to a comma-delimited text file.

Parameters:

saveFileName : string

The path and file name for the location to save the data

activeDataOnly : bool, default: True

Whether or not only active data should be exported

Examples

>>> mySS.exportCSV("/data/results.csv",0)
>>> 
spreadsheet.exportDSF(saveFileName, datasetName, activeDataOnly)

Exports the current spreadsheet object to a DSF file including the applied marker map (if applicable).

Parameters:

saveFileName : string

The path and file name for the location to save the data

datasetName : string

Dataset name for the DSF file

activeDataOnly : bool, default: True

Whether or not only active data should be exported

Examples

>>> mySS.exportDSF("/data/results.dsf",'My Results Dataset', 1)
>>> 
spreadsheet.exportGHD(saveFileName, activeDataOnly)

Exports the current spreadsheet object to a GHD file.

Parameters:

saveFileName : string

The path and file name for the location to save the data

activeDataOnly : bool, default: True

Whether or not only active data should be exported

Examples

>>> mySS.exportGHD("/data/results.ghd",0)
>>> 
spreadsheet.exportMarkerMap(saveFileName, saveDatasetName, activeDataOnly)

Exports the marker map applied to the current spreadsheet object to a DSM file.

Parameters:

saveFileName : string

The path and file name for the export of the DSM file

saveDatasetName : string, optional

Specifies the name of the marker map dataset

activeDataOnly : bool, default: True

Whether or not only active marker map data is saved

Examples

>>> mySS.exportMarkerMap("/data/myresultsmap.dsm", activeDataOnly = True,
                         saveDatasetName = "My Results Marker Map")
>>> 
spreadsheet.exportToFile(saveFileName, **kws)

This command exports the current spreadsheet object to a text or third-party file format.

Parameters:

saveFileName : string

The path and file name to a specified file format

exportMarkerMap : bool, default: True

Whether or not to export marker map data. Marker map information can only be exported if the marker map is applied to row labels.

activeDataOnly : bool, default: True

Whether or not to export active data only

saveRowLabels : bool, default: True

Whether or not row labels should be exported

saveColHeaders : bool, default: True

Whether or not column headers should be exported

fieldDelimiter : string, (one of below if applicable)

Indicates the text delimiter
  • ‘,’: comma separated (default)
  • ‘ ‘: whitespace delimited
  • ‘/t’: tab delimited
  • other one character string

alleleDelimiter : string, default: ‘_’

The allele delimiter, any one character string

missingValues : string, default: ‘?’

The missing value string, any one character string

missingAllele : string, default: ‘?’

The missing allele string, any one character string

Examples

>>> mySS.exportToFile("/data/exportToFile.txt", activeDataOnly = 0,
                      fieldDelimiter = '/t', missingValues = '-')
>>> 
spreadsheet.exportTransposeToFile(saveFileName, **kws)

This command exports the transpose of the current spreadsheet object to a text or third-party file.

Parameters:

saveFileName : string

The path and file name to a specified file format

exportMarkerMap : bool, default: True

Whether or not to export marker map data. Marker map information can only be exported if the marker map is applied to column headers.

activeDataOnly : bool, default: True

Whether or not to export active data only

saveRowLabels : bool, default: True

Whether or not row labels should be exported

saveColHeaders : bool, default: True

Whether or not column headers should be exported

fieldDelimiter : string, (one of below if applicable)

Indicates the text delimiter
  • ‘,’: comma separated (default)
  • ‘ ‘: whitespace delimited
  • ‘/t’: tab delimited
  • other one character string

alleleDelimiter : string, default: ‘_’

The allele delimiter, any one character string

missingValues : string, default: ‘?’

The missing value string, any one character string

missingAllele : string, default: ‘?’

The missing allele string, any one character string

columnLabel: string, `default: ‘Columns’` :

Specifies a label for the column name headers

Examples

>>> mySS.exportTransposeToFile("/data/exportTransposeToFile.txt", 
                               activeDataOnly = True, 
                               fieldDelimiter = '/t', 
                               missingValues = '?')
>>> 
spreadsheet.filterGenotypes(**kws)

This command performs genotype filtering, with the option to inactivate columns that do not pass filtering criteria, as well as an option to create a spreadsheet containing column statistics and filtering status.

When specifying a statistic to use for filtering, a two item list is required. The first value in the list should be the filtering parameter and the second value should be the threshold for filtering.

Ex: callRate = [ghi.const.LessThan, 0.95]

Filtering types are represented as follows:
Parameters:

alleleRefField : string, default: “”

  • If a non-empty string is specified, alleles should be classified by reference allele vs. alternate allele, using this marker map field to determine the reference allele.
  • If an empty string is specified or this parameter is omitted, alleles will be classified by major allele vs. minor allele as determined from the data.

outputMarkersSheet : bool, default: True

Whether or not to output a spreadsheet containing statistics and filter status for each marker in the original spreadsheet.

inactivateCols : bool, default: True

Whether or not to inactivate columns that do not meet filtering criteria.

filterBasis : bool, (one of below)

Indicates what group of samples to use for filtering markers.

outputNegLog : bool, default: True

Whether or not to output negative log p-values in the filtering status spreadsheet for HWE filtering.

callRate : list containing criteria and threshold

Whether or not to filter on call rates.

numOfAlleles : list containing criteria and threshold

Whether or not to filter on the number of alleles.

maf : list containing criteria and threshold

  • If alleles are classified by major/minor: Whether or not to filter on the minor allele frequency
  • If alleles are classified by reference/alternate: Whether or not to filter on the alternate allele frequency

carrierCount : list containing criteria and threshold

Whether or not to filter on carrier count

hwep : list containing criteria and threshold

Whether or not to filter on HWE P-Value

fisherHwep : list containing criteria and threshold

Whether or not to filter on Fisher’s Exact HWE P-Value

signedHweR : list containing criteria and threshold

Whether or not to filter on Signed HWE R

Returns:

spreadsheet : Spreadsheet Object

Column statistics and filtering report spreadsheet

Examples

>>> filterSS = mySS.filterGenotypes(filterBasis = ghi.const.ControlsOnly,
                             callRate = [ghi.const.LessOrEqual, 0.75],
                             maf = [ghi.const.LessThan, 0.07],
                             numOfAlleles = [ghi.const.LessThan, 2])
>>> 
spreadsheet.genotypeAssociationTests(**kws)

Performs genotype association tests on the current spreadsheet with a dependent variable specified.

Not all tests are available for all models.

Parameters:

alleleRefField : string, default: “”

  • If a non-empty string is specified, alleles should be classified by reference allele vs. alternate allele, using this marker map field to determine the reference allele.
  • If an empty string is specified or this parameter is omitted, alleles will be classified by major allele vs. minor allele as determined from the data.

geneticModel : integer, (one of below)

The genetic model/test to use for association testing
  • ghi.const.ModelAllelic: Basic Allelic Tests
  • ghi.const.ModelGenotypic: Genotypic Tests
  • ghi.const.ModelAdditive: Additive model (default)
  • ghi.const.ModelDominant: Dominant model
  • ghi.const.ModelRecessive: Recessive model

corrTrend : bool, default: False

Whether or not to perform the Correlation/Trend test

chiSq : bool, default: False

Whether or not to perform the Chi-Squared Test

armitage : bool, default: False

Whether or not to perform the Cochran-Armitage Trend Test

exactArmitage : bool, default: False

Whether or not to perform the exact form of the Cochran-Armitage Trend Test

fishers : bool, default: False

Whether or not to perform Fisher’s Exact Test

oddsRatio : bool, default: False

Whether or not to calculate Odds Ratios

analysisDev : bool, default: False

Whether or not to calculate Analysis of Deviance

regression : bool, default: False

Whether or not to perform regression analysis

fTest : bool, default: False

Whether or not to perform an F-Test

bonferroni : bool, default: True

Whether or not to use Bonferroni adjustment

fdr : bool, default: True

Whether or not to calculate False Discovery Rate

singleValuePermutations : bool, default: False

Whether or not to perform single value permutation tests

fullScanPermutations : bool, default: False

Whether or not to perform full scan permutation tests

numPermutations : integer, >=3, default: 0

The number of permutations to be used for the selected number of permutation tests.

showInflationFactor : bool, default: False

Whether or not to show the inflation factor, Chi-Squares and corrected values used in genomic control

specifyInflationFactor : bool, default: False

Whether or not to specify an inflation factor to use for genomic control

inflationFactor : double, >= 1.0 default: 1.0

The inflation factor (lambda) to use for genomic control

useMissings : bool, default: False

Whether or not to use missing values

outputPPQQ : bool, default: False

Whether to output data for P-P/Q-Q plots (of association test outputs).

outputNegLog : bool, default: True

Whether to output -log10 P (for association test outputs).

usePca : bool, default: False

Whether or not to use PCA.

pcaMaxTopComponents : integer, > 0, default: 10

The maximum number of principal components to include. Requires usePca = True.

pcaOutputCorrected : bool, default: False

Whether or not to output a spreadsheet containing PCA corrected values. Requires usePca = True.

pcaOutputPcSheet : bool, default: True

Whether or not to output a principal components spreadsheet. Requires usePca = True.

pcaOutputEigenSheet : bool, default: True

Whether or not to output an eigenvalue spreadsheet. Requires usePca = True.

pcaNormalization : integer, (one of below)

Indicates the PCA marker normalization method to use

pcaRecompute : bool, default: False

Whether or not to remove outliers and recompute principal components. Requires usePca = True.

pcaRecompStdDev : double, >= 1.0, default: 6

Number of standard deviations with which to identify outliers. Requires usePca = True and pcaRecompute = True.

pcaRecompCount : integer, >= 0, default: 5

The number of times to recompute principal components. Requires usePca = True and pcaRecompute = True.

pcaRecompComponents : integer, >= 0, default: 5

The number of components to use in identifying outliers. Requires usePca = True and pcaRecompute = True.

pcaPrecomputedSheet : integer

The Node ID of a pre-computed principal component spreadsheet to use instead of calculating components.

maf : bool, default: False

  • If alleles are classified by major/minor: Whether or not to output minor allele frequencies
  • If alleles are classified by reference/alternate: Whether or not to output alternate allele frequencies

callRate : bool, default: False

Whether or not to output call rates

numOfAlleles : bool, default: False

Whether or not to output the number of alleles per marker

carrierCount : bool, default: False

Whether or not to output the carrier count

fisherHwep : bool, default: False

Whether or not to output Fisher’s Exact HWE P-Values

hwep : bool, default: False

Whether or not to output HWE P-Values

signedHweR : bool, default: False

Whether or not to output Signed HWE R

genoCounts : bool, default: False

Whether or not to output genotype counts

alleleCounts : bool, default: False

Whether or not to output allele counts

outputStatPPQQ : bool, default: False

Whether to output data for P-P/Q-Q plots of marker statistic p-value outputs.

outputNegLogStatPValue : bool, default: True

Whether to output -log10(Value) for marker statistic p-value outputs.

Returns:

spreadsheets : a list of spreadsheet objects of length 5 as follows

  • 0: Association test output spreadsheet
  • 1: PCA-corrected input spreadsheet
  • 2: Principal component spreadsheet
  • 3: Principal component eigenvalues spreadsheet
  • 4: PCA outlier spreadsheet

Examples

>>> myResultsList = mySS.genotypeAssociationTests(corrTrend = 1, 
                         outputPPQQ = 1, usePca = True, 
                         pcaMaxTopComponents = 15, genoCounts = True)
>>> 
spreadsheet.genotypePCA(**kws)

This command performs PCA analysis and returns a list of spreadsheet objects, the contents of which will vary depending on the options used.

Parameters:

alleleRefField : string, default: “”

  • If a non-empty string is specified, alleles should be classified by reference allele vs. alternate allele, using this marker map field to determine the reference allele.
  • If an empty string is specified or this parameter is omitted, alleles will be classified by major allele vs. minor allele as determined from the data.

pcaGeneticModel : integer, (one of below)

The genetic model/test to use for association testing

pcaNormalization : integer, (one of below)

Indicates the PCA marker normalization method to use

pcaMaxTopComponents : integer, > 0, default: 10

The maximum number of principal components to include.

pcaOutputCorrected : bool, default: False

Whether or not to output a spreadsheet containing PCA corrected values.

pcaOutputPcSheet : bool, default: True

Whether or not to output a principal components spreadsheet.

pcaOutputEigenSheet : bool, default: True

Whether or not to output an eigenvalue spreadsheet.

pcaRecompute : bool, default: False

Whether or not to remove outliers and recompute principal components.

pcaRecompCount : integer, >= 0, default: 5

The number of times to recompute principal components. Requires pcaRecompute = True.

pcaRecompStdDev : double, >= 1.0, default: 6

Number of standard deviations with which to identify outliers. Requires pcaRecompute = True.

pcaRecompComponents : integer, >= 0, default: 5

The number of components to use in identifying outliers. Requires pcaRecompute = True.

pcaPrecomputedSheet : integer

The Node ID of a pre-computed principal component spreadsheet to use instead of calculating components.

Returns:

spreadsheets : a list of spreadsheet objects of length 4 as follows

  • 0: PCA-corrected input spreadsheet
  • 1: Principal component spreadsheet
  • 2: Principal component eigenvalues spreadsheet
  • 3: PCA outlier spreadsheet

Examples

>>> mySSList = mySS.genotypePCA(pcaMaxTopComponents = 35, 
                                pcaNormalization = ghi.const.PcaNormNone,
                                pcaOutputCorrected = True)
>>> 
spreadsheet.genotypeStatsByMarker(**kws)

This command creates an output spreadsheet containing the selected statistics.

Parameters:

alleleRefField : string, default: “”

  • If a non-empty string is specified, alleles should be classified by reference allele vs. alternate allele, using this marker map field to determine the reference allele.
  • If an empty string is specified or this parameter is omitted, alleles will be classified by major allele vs. minor allele as determined from the data.

maf : bool, default: False

  • If alleles are classified by major/minor: Whether or not to output minor allele frequencies
  • If alleles are classified by reference/alternate: Whether or not to output alternate allele frequencies

callRate : bool, default: False

Whether or not to output call rates

numOfAlleles : bool, default: False

Whether or not to output the number of alleles per marker

carrierCount : bool, default: False

Whether or not to output the carrier count

fisherHwep : bool, default: False

Whether or not to output Fisher’s Exact HWE P-Values

hwep : bool, default: False

Whether or not to output HWE P-Values

signedHweR : bool, default: False

Whether or not to output Signed HWE R

genoCounts : bool, default: False

Whether or not to output genotype counts

alleleCounts : bool, default: False

Whether or not to output allele counts

outputStatPPQQ : bool, default: False

Whether to output data for P-P/Q-Q plots of marker statistic p-value outputs.

outputNegLogStatPValue : bool, default: True

Whether to output -log10(Value) for marker statistic p-value outputs.

Returns:

spreadsheet : Spreadsheet Object

Examples

>>> mySS.genotypeStatsByMarker(maf = 1, callRate = 1, hwep = 1, 
                               signedHweR = 1)
>>> 
spreadsheet.genotypeStatsBySample(**kws)

This command creates a list of one or more spreadsheet objects containing the selected statistics.

Parameters:

numWithMinorAllele : bool, default: False

Whether or not to output the number and fraction of genotypes with a minor allele (as determined from sample data)

numOfVariantGenotypes : bool, default: False

Whether or not to output the number of variant genotypes (non-reference) Requires spreadsheet to be marker-mapped with a “Reference” field.

numOfSingletons : bool, default: False

Whether or not to output the number of singletons (variant genotype present only in given sample) Requires spreadsheet to be marker-mapped with a “Reference” field.

meanTiTv : bool, default: False

Whether or not to output the mean Ti/Tv of variant genotypes Requires spreadsheet to be marker-mapped with a “Reference” field.

thwPValue : bool, default: False

Whether or not to output the Hardy-Weinberg Thw P-Value (taken over all autosomal chromosomes and all samples)

genderInference : bool, default: False

Whether or not to output gender inference and X-Chromosome statistics (and statistics for other non-autosomal chromosomes)

mfHetThreshold : double, [0.0, 1.0], default: 0.02

Heterozygosity threshold used to distinguish gender

outputForEachAutosome : bool, default: False

Whether or not to output count and variant statistics for each autosomal chromosome

Returns:

spreadsheets : a list of up to four spreadsheet objects:

  • Overall, overall autosomal, and gender-related statistics by sample
  • Statistics of individual autosomes by sample (if outputForEachAutosome was selected)
  • Overall, overall autosomal, and gender-related statistics by category of sample (if a categorical column has been made dependent)
  • Statistics of individual autosomes by category of sample (if outputForEachAutosome was selected and a categorical column has been made dependent)

Examples

>>> mySS.genotypeStatsBySample(numWithMinorAllele = 1, meanTiTv = 1, 
                               genderInference = 1, mfHetThreshold = 0.05)
>>> 
spreadsheet.getColState(column)

Deprecated since version 7.4.0: colState()

Parameters:column : integer
Returns:state : integer
spreadsheet.getRowState(row)

Deprecated since version 7.4.0: rowState()

Parameters:row : integer
Returns:state : integer
spreadsheet.haplotypeAssociationTests(**kws)

This command runs haplotype association tests on the current spreadsheet and returns a reference to the results spreadsheet.

Parameters:

blockDefinitionSource : integer, (one of below)

The source of the haplotype block definitions

blockDefinitionSheet : integer

Specifies a spreadsheet containing haplotype block information. Requires blockDefinitionSource = ghi.const.HapBlockPrecomputed.

blockDefinitionCol : integer

Specifies a column number of the haplotype block spreadsheet containing the haplotype block information. Requires blockDefinitionSheet.

movingWindowType : integer, (one of below)

The type of moving window to use. Requires blockDefinitionSource = ghi.const.HapBlockMovingWindow.

fixedWindowSize : integer, >= 1, default: 2

The fixed window size. Requires movingWindowType = ghi.const.WindowFixed

dynamicWindowBasePairs : integer, >= 1, default: 10

The dynamic window size in kilo-base pairs. Requires movingWindowType = ghi.const.WindowDynamic.

limitDynamicMaxCols : integer, >= 1, default: 20

The maximum dynamic moving window size in markers. Requires movingWindowType = ghi.const.WindowDynamic.

showMarkerNames : bool, default: True

Whether or not to show the names of the markers on which the haplotypes are based.

haploCalculationType : integer, (one of below)

Haplotype tests are calculated based on:

chiSq : bool, default: False

Whether or not to calculate chi-squared statistics

oddsRatio : bool, default: False

Whether or not to calculate odds ratios

regression : bool, default: False

Whether or not to perform regression tests

haplotypeMethod : integer, (one of below)

Specifies the haplotype imputation algorithm

maxEmIterations : integer, >= 1, default: 50

Maximum number of EM iterations regardless of reaching convergence

emConvergeTolerance : real, > 0, default: 0.0001

Desired tolerance the EM algorithm will use to determine when it reaches convergence

frequencyThreshold : real, [0, 1), default: 0.01

Minimum imputed frequency to determine whether a haplotype is used

imputeMissing : bool, default: False

Whether or not to impute missing values

bonferroni : bool, default: True

Whether or not to output Bonferroni adjusted p-values

fdr : bool, default: True

Whether or not to compute False Discovery Rate

singleValuePermutations : bool, default: False

Whether or not to perform single value permutation tests

fullScanPermutations : bool, default: False

Whether or not to perform full scan permutation tests

numPermutations : integer, >= 3, default: 0

The number of permutation tests to be used

outputHaplotypeFreq : bool, default: False

Whether or not to output haplotype frequencies

outputPPQQ : bool, default: False

Whether or not to output P-P/Q-Q data

outputNegLog : bool, default: True

Whether or not to output -log10 P values

Returns:

spreadsheet : Spreadsheet Object

Examples

>>> ss.haplotypeAssociationTests(
            blockDefinitionSource = ghi.const.HapBlockMovingWindow,
            movingWindowType = ghi.const.WindowFixed, fixedWindowSize = 3, 
            chiSq = True)
spreadsheet(
 Name=Haplotype Association Tests (Per Haplotype)
 Node ID=152
 Rows=21738
 Columns=6
 Data Set Name=Haplotype Association Tests (Per Haplotype)
)
>>> 
spreadsheet.haplotypeBlockDetection(method, **kws)

This command creates a haplotype blocks spreadsheet

Parameters:

method : integer, (one of below)

Method used in block detection

upperConfidenceBound : real, [0.0001, 0.999], default: 0.98

The minimum upper bound of the D’ statistic

lowerConfidenceBound : real, [0.0001, 0.999], default: 0.70

The minimum lower bound of the D’ statistic

confidenceLevel : real, (one of below)

The statistical confidence level

minUpperConfidence : real, [0.0001, 0.999], default: 0.90

The reject criteria based on the upper bound of the D’ statistic.

minMAF : real, [0.0001, 0.999], default: 0.05

Minor allele frequency to filter SNPs from blocks

maxMarkers : integer, >= 1, default: 30

Maximum number of SNPs in a block

maxBlockLength : integer, >= 1, default: 160

Maximum length of a block in kilo-base pairs

haplotypeMethod : integer, (one of below)

The method used to estimate haplotype frequencies

maxEmIterations : integer, >= 1, default: 50

Maximum number of EM iterations regardless of reaching convergence

emConvergeTolerance : real, > 0, default: 0.00001

Desired tolerance the EM algorithm will use to determine when it reaches convergence

frequencyThreshold : real, [0,1), default: 0.01

Minimum imputed frequency to determine whether a haplotype is used

imputeMissing : bool, default: False

Whether or not missing values are imputed

Returns:

spreadsheet : Spreadsheet Object

Examples

>>> mySS.haplotypeBlockDetection(ghi.const.BlockDetectGabriel, 
                  haplotypeMethod = ghi.const.HapImputeEM)
spreadsheet(
 Name=Haplotype blocks, 272 markers in 114 groups
 Node ID=155
 Rows=272
 Columns=1
 Data Set Name=Haplotype blocks, 272 markers in 114 groups
)
>>> 
spreadsheet.haplotypeCalculations(cols, **kws)

This command computes the haplotype tables for a set of columns in a genotypic spreadsheet.

Parameters:

cols : list of integers

Column indexes for markers to compute haplotype tables for.

tableEM : bool, default: True

Whether or not to output an EM frequency table

tableCHM : bool, default: False

Whether or not to output a CHM frequency table

tableDiplotype : bool, default: False

Whether or not to output a diplotype table

tableHaplotype : bool, default: False

Whether or not to output a haplotype table

imputeMissings : bool, default: False

Whether or not to impute missing values

maxIters : integer, >= 1, default: 50

Maximum number of EM iterations performed regardless of reaching convergence

convTolerance : real, > 0, default: 0.0001

The desired tolerance that the EM algorithm will use to determine convergence

See also

Haplotype Tables

Examples

>>> mySS.haplotypeCalculations(range(5,15), tableEM = True, 
                               tableCHM = True, tableHaplotype = True)
[spreadsheet(
 Name=EM Frequency Table
 Node ID=158
 Rows=435
 Columns=4
 Data Set Name=EM Frequency Table
 ), spreadsheet(
 Name=CHM Frequency Table
 Node ID=161
 Rows=435
 Columns=5
 Data Set Name=CHM Frequency Table
 ), spreadsheet(
 Name=Haplotype Table
 Node ID=164
 Rows=3
 Columns=3
 Data Set Name=Haplotype Table
)]
>>> 
spreadsheet.haplotypeTrendRegression(**kws)

This command performs linear or logistic haplotype trend regression on the current spreadsheet object using the currently selected dependent column, and returns a list of objects.

Parameters:

blockDefinitionSource : integer, (one of below)

The source of the haplotype block definitions

blockDefinitionSheet : integer

Specifies a spreadsheet containing haplotype block information. Requires blockDefinitionSource = ghi.const.HapBlockPrecomputed.

blockDefinitionCol : integer

Specifies a column number of the haplotype block spreadsheet containing the haplotype block information. Requires blockDefinitionSheet.

movingWindowType : integer, (one of below)

The type of moving window to use. Requires blockDefinitionSource = ghi.const.HapBlockMovingWindow.

fixedWindowSize : integer, >= 1, default: 2

The fixed window size. Requires movingWindowType = ghi.const.WindowFixed

dynamicWindowBasePairs : integer, >= 1, default: 10

The dynamic window size in kilo-base pairs. Requires movingWindowType = ghi.const.WindowDynamic.

limitDynamicMaxCols : integer, >= 1, default: 20

The maximum dynamic moving window size in markers. Requires movingWindowType = ghi.const.WindowDynamic.

showMarkerNames : bool, default: True

Whether or not to show, in the spreadsheet output, the names of the markers on which the haplotypes are based.

outputResidualSheet : bool, default: False

Whether or not to output a residual spreadsheet.

useStepwise : bool, default: False

Whether or not to use stepwise regression.

stepwiseCutoff : double, [0, 1], default: 0.01

The p-value cutoff for stepwise regression.

stepwiseMethod : integer, (one of below)

The stepwise regression method.

computeFullVsReducedModel : bool, default: False

Whether or not to compute a full versus reduced model.

fullModelCovariates : list of column numbers

The columns to use as full model covariates, ex: [2,3]

fullModelInteractions : list of pairs of column numbers

Pairs of columns to use as interactions for full model covariates. ex: [[2,3],[3,4]]

reducedModelCovariates : list of column numbers

The columns to use as reduced model covariates, ex: [5,6]

reducedModelInteractions : list of pairs of column numbers

Pairs of columns to use as interactions for reduced model covariates. ex: [[2,3],[3,4]]

bonferroni : bool, default: True

Whether or not to use Bonferroni adjustment

fdr : bool, default: True

Whether or not to calculate False Discovery Rate

singleValuePermutations : bool, default: False

Whether or not to perform single value permutation tests

fullScanPermutations : bool, default: False

Whether or not to perform full scan permutation tests

numPermutations : integer, >=3, default: 0

The number of permutations to be used for the selected number of permutation tests.

outputPPQQ : bool, default: False

Whether to output data for P-P/Q-Q plots

outputNegLog : bool, default: True

Whether or not to output -log10 P values

outputDetailedResults : list of criteria

Indicates that detailed output should be included if the criteria is satisfied. Some threshold values are only available for certain models and regressions.

Returns:

spreadsheets : a list of spreadsheet objects of length 3 as follows

  • 0: Regression results spreadsheet
  • 1: Residual Spreadsheet
  • 2: Detailed regression results viewer

Examples

>>> ss.haplotypeTrendRegression(
              blockDefinitionSource = ghi.const.HapBlockMovingWindow,
              movingWindowType = ghi.const.WindowFixed, fixedWindowSize = 3,
              outputDetailedResults=[])
[ spreadsheet(
  Name=Regression Results
  Node ID=99
  Rows=200
  Columns=9
  Data Set Name=Regression Results
  ), 0, ResultViewer(
  Name = Regression Statistics Viewer
  Node ID = 100)
] 
>>> 
>>> subsetSs.haplotypeTrendRegression(
                    blockDefinitionSource = ghi.const.HapBlockAllMarkers,
                    outputResidualSheet = True)
[ 0, spreadsheet(
  Name=Residual Spreadsheet
  Node ID=104
  Rows=100
  Columns=8
  Data Set Name=Residual Spreadsheet
  ), ResultViewer(
  Name = Regression Statistics Viewer
  Node ID = 105)
] 
>>> 
>>> ss.haplotypeTrendRegression(
              blockDefinitionSource = ghi.const.HapBlockPrecomputed,
              blockDefinitionSheet = 109, blockDefinitionCol = 1,
              reducedModelCovariates = [2], 
              computeFullVsReducedModel = True,
              outputDetailedResults=[ghi.const.ThresholdMainP, 
              ghi.const.LessThan, .5])
[ spreadsheet(
  Name=Regression Results
  Node ID=112
  Rows =6
  Columns=11
  Data Set Name=Regression Results
  ), 0, ResultViewer(
  Name = Regression Statistics Viewer
  Node ID = 113)
] 
>>> 
spreadsheet.hasPedFields()

This command indicates whether the current spreadsheet object contains pedigree columns.

Returns:

success : bool

Whether or not the spreadsheet contains pedigree columns

Examples :

>>> myPedSheet.hasPedFields() :

True :

>>> myOtherSS.hasPedFields() :

False :

>>> :

spreadsheet.identityByDescentEstimation(**kws)

This command calculates identity by descent estimation from a genotype spreadsheet.

Parameters:

useOnlyFounders : bool, default: True

Whether or not to use only founders when counting alleles to compute allele frequencies for the IBD calculations. (Only applicable when the spreadsheet contains pedigree information.)

outputIbsDistances : bool, default: True

Whether or not to output identity-by-state distances, defined as ( (# of markers with IBS state 2) + 0.5 * (# of markers with IBS state 1) ) / (# of non-missing markers).

outputEstimates : bool, default: True

Whether or not to output untransformed estimates. This is useful for QC checking.

outputPI : bool, default: True, optional

Whether or not to output PI values calculated from untransformed estimates.

outputTransformed : bool, default: False

Whether or not to output PI values calculated from estimates transformed to always be biologically plausible values.

outputAllPairs : bool, default: False

Whether or not to output a sample by sample all pairs spreadsheet for all comparisons where PI is greater than a threshold.

piThreshold : real, [0, 1], default: 0.025

PI threshold for outputting pair estimates.

Returns:

spreadsheets : list of spreadsheet objects

Examples

>>> outputList = ss.identityByDescentEstimation(useOnlyFounders=True, 
                                                outputEstimates=True,
                                                outputPI=True, 
                                                outputTransformed=False,
                                                outputAllPairs=False, 
                                                piThreshold=0.025)
>>> outputList
[spreadsheet(
 Name=IBD Estimate:  P(Z = 0)
 Node ID=128
 Rows=3000
 Columns=3000
 Data Set Name=IBD Estimate:  P(Z = 0)
), spreadsheet(
 Name=IBD Estimate:  P(Z = 1)
 Node ID=131
 Rows=3000
 Columns=3000
 Data Set Name=IBD Estimate:  P(Z = 1)
), spreadsheet(
 Name=IBD Estimate:  P(Z = 2)
 Node ID=134
 Rows=3000
 Columns=3000
 Data Set Name=IBD Estimate:  P(Z = 2)
), spreadsheet(
 Name=IBD Estimate:  Estimated PI
 Node ID=137
 Rows=3000
 Columns=3000
 Data Set Name=IBD Estimate:  Estimated PI
)]
>>> 
>>>outputIBS = ss.identityByDescentEstimation(outputIbsDistances=True, 
                                              outputEstimates=False, 
                                              outputPI=False)
>>>outputIBS
[spreadsheet(
 Name=IBS Distance ( (IBS2 + 0.5*IBS1) / # non-missing markers )
 Node ID=318
 Rows=565
 Columns=565
 Data Set Name=IBS Distance ( (IBS2 + 0.5*IBS1) / # non-missing markers )
)]
>>> # The above example will optimally compute and output 
>>> # (a list containing) an IBS spreadsheet.
spreadsheet.invertColState()

This command causes the state of all columns to be inverted.

Columns that were active or dependent are set as inactive. Columns that were inactive are set as active.

Examples

>>> mySS.invertColState()
>>> 
spreadsheet.invertRowState()

This command causes the state of all rows to be inverted.

Rows that were active are set as inactive. Rows that were inactive are set as active.

Examples

>>> mySS.invertRowState()
>>> 
spreadsheet.joinByRowLabels(id, newDatasetName, merge, childOfCurrent, dupResolution, colCaseSensitive, rowCaseSensitive)

This command joins a second spreadsheet by row labels to the current spreadsheet object.

Parameters:

id : integer, >= 4

Node ID for the spreadsheet to join to the current spreadsheet

newDatasetName : string, default: ‘Joined Spreadsheet’

New name for the joined spreadsheet

merge : bool, default: False

Whether or not to keep non-matching rows and fill with missing values as necessary.

childOfCurrent : bool, default: True

Whether or not to create the joined spreadsheet as a child of the current spreadsheet.

dupResolution : integer, (one of below)

Behavior for handling duplicate column headers

colCaseSensitive : bool, default: True, optional

Whether to match the column headers case sensitive or case insensitive.

rowCaseSensitive : bool, default: True, optional

Whether to match the row labels case sensitive or case insensitive.

Returns:

spreadsheet : Spreadsheet Object

Examples

>>> newSS =  mySS.joinByRowLabels(166, childOfCurrent = False, 
                                  rowCaseSensitive = False)
>>> 
spreadsheet.markerMapFileName()

Returns the name of the project map file used to marker map this spreadsheet. Useful for calling applyMarkerMap on another spreadsheet with this string as the first parameter.

Returns:

mapName : string

Name of the marker map applied to the current spreadsheet

Examples

>>> mySS.markerMapFileName()
u'map/map8.dsm'
>>> 
spreadsheet.numColsState(state)

This command returns the number of columns of the specified column state in the current spreadsheet object.

Parameters:

state : integer, (one of below)

Returns:

numberOfState : integer

The number of columns of the specified state

Examples

>>> mySS.numColsState(ghi.const.StateDependent)
5
>>> 
spreadsheet.numericAssociationTests(**kws)

This command performs numeric association tests using the current spreadsheet and creates a list of spreadsheets containing the test results.

Parameters:

corrTrend : bool

Whether or not to perform the Correlation/Trend Test.

tTest : bool, default: False

Whether or not to perform a T-Test.

regression : bool, default: False

Whether or not to perform regression.

bonferroni : bool, default: True

Whether or not to use Bonferroni adjustment.

fdr : bool, default: True

Whether or not use False Discovery Rate.

singleValuePermutations : bool, default: False

Whether or not to perform single value permutation tests.

fullScanPermutations : bool, default: False

Whether or not to perform full-scan permutation tests.

numPermutations : integer, >= 3, default: 0

The number of permutations to be used for the selected permutation tests.

outputPPQQ : bool, default: False

Whether to output data for P-P/Q-Q plots.

outputNegLog : bool, default: True

Whether or not to output -log10 P values.

usePca : bool, default: False

Whether or not to use PCA.

usePcaForDependent : bool, default: False

Whether to use a PCA corrected dependent.

pcaMaxTopComponents : integer, > 0, default: 10

The maximum number of principal components to include. Requires usePca = True.

pcaOutputCorrected : bool, default: False

Whether or not to output a spreadsheet containing PCA corrected values. Requires usePca = True.

pcaOutputPcSheet : bool, default: True

Whether or not to output a principal components spreadsheet. Requires usePca = True.

pcaOutputEigenSheet : bool, default: True

Whether or not to output an eigenvalue spreadsheet. Requires usePca = True.

pcaRecompute : bool, default: False

Whether or not to remove outliers and recompute principal components. Requires usePca = True.

pcaRecompStdDev : double, >= 1.0, default: 6

Number of standard deviations with which to identify outliers. Requires usePca = True and pcaRecompute = True.

pcaRecompCount : integer, >= 0, default: 5

The number of times to recompute principal components. Requires usePca = True and pcaRecompute = True.

pcaRecompComponents : integer, >= 0, default: 5

The number of components to use in identifying outliers. Requires usePca = True and pcaRecompute = True.

pcaPrecomputedSheet : integer

The Node ID of a pre-computed principal component spreadsheet to use instead of calculating components.

dataCentering : integer (one of below), optional

When PCA is computed, whether to perform data centering by marker, sample, or both.

Returns:

spreadsheets : a list of spreadsheet objects of length 5 as follows

  • 0: Association test output
  • 1: PCA-corrected input
  • 2: Principal component spreadsheet
  • 3: Principal component eigenvalues
  • 4: PCA outlier spreadsheet

Examples

>>> resultsList = mySS.numericAssociationTests(tTest = 1, corrTrend = 1,
                                                 bonferroni = 0)
>>> 
spreadsheet.numericPCA(**kws)

This command performs PCA analysis and returns a list of spreadsheet objects, the contents of which will vary depending on the options used.

Parameters:

pcaMaxTopComponents : integer, > 0, default: 10

The maximum number of principal components to include. Requires usePca = True.

pcaOutputCorrected : bool, default: False

Whether or not to output a spreadsheet containing PCA corrected values.

pcaOutputPcSheet : bool, default: True

Whether or not to output a principal components spreadsheet.

pcaOutputEigenSheet : bool, default: True

Whether or not to output an eigenvalue spreadsheet.

pcaRecompute : bool, default: False

Whether or not to remove outliers and recompute principal components.

pcaRecompCount : integer, >= 0, default: 5

The number of times to recompute principal components. Requires pcaRecompute = True.

pcaRecompStdDev : double, >= 1.0, default: 6

Number of standard deviations with which to identify outliers. Requires pcaRecompute = True.

pcaRecompComponents : integer, >= 0, default: 5

The number of components to use in identifying outliers. Requires pcaRecompute = True.

pcaPrecomputedSheet : integer

The Node ID of a pre-computed principal component spreadsheet to use instead of calculating components.

dataCentering : integer (one of below), optional

When PCA is computed, whether to perform data centering by marker, sample, or both.

Returns:

spreadsheets : a list of spreadsheet objects of length 4 as follows

  • 0: PCA-corrected input spreadsheet
  • 1: Principal component spreadsheet
  • 2: Principal component eigenvalues spreadsheet
  • 3: PCA outlier spreadsheet

Examples

>>> mySSList = mySS.numericPCA(pcaMaxTopComponents = 27,
                               pcaOutputCorrected = 1)
>>> 
spreadsheet.numericRegression(**kws)

This command performs linear or logistic regression on the current spreadsheet object using the currently selected dependent column, and returns a list of objects.

Parameters:

fullModelRegressors : integer, (one of below)

Indicates how the full model regressors are selected.

movingWindowType : integer, (one of below)

The type of moving window to use. Requires fullModelRegressors = ghi.const.RegressMovingWindow.

fixedWindowSize : integer, >= 1, default: 1

The size of the fixed moving window. Requires movingWindowType = ghi.const.WindowFixed.

dynamicWindowBasePairs : integer, >= 1, default: 1000

The dynamic window size in base pairs. Requires movingWindowType = ghi.const.WindowDynamic.

limitDynamicMaxCols : bool, default: False

Whether or not to limit the number of columns that can be spanned by the dynamic moving window.

dynamicWindowMaxColumns : integer, >= 1, default: 20

The maximum dynamic moving window size in markers. Requires movingWindowType = ghi.const.WindowDynamic.

outputResidualSheet : bool, default: False

Whether or not to output a residual spreadsheet.

useStepwise : bool, default: False

Whether or not to use stepwise regression.

stepwiseCutoff : double, [0, 1], default: 0.01

The p-value cutoff for stepwise regression.

stepwiseMethod : integer, (one of below)

The stepwise regression method.

computeFullVsReducedModel : bool, default: False

Whether or not to compute a full versus reduced model. Defaults to True for covariate-column interactions (fullModelRegressors = ghi.const.RegressCovInter).

fullModelCovariates : list of column numbers

The columns to use as full model covariates. Ex: [2,3]

fullModelInteractions : list of pairs of column numbers

Pairs of columns to use as interactions for full model covariates. Ex: [[2,3],[3,4]]

reducedModelCovariates : list of column numbers

The columns to use as reduced model covariates. Ex: [5,6]

reducedModelInteractions : list of pairs of column numbers

Pairs of columns to use as interactions for reduced model covariates. Ex: [[2,3],[3,4]]

covColInteractions : list of column numbers

For regressing on covariate-column interactions, the columns to use as covariates that will interact with the predictor. Ex: [2,3]

threeWayInteractions : list of pairs of column numbers

For regressing on covariate-column interactions, pairs of columns to use as the first two covariates for three-way (second-order) interactions with the predictor. Ex: [[2,3],[3,4]]

additionalReducedCovariates : list of column numbers

For regressing on covariate-column interactions, the columns to use as additional reduced model covariates. Ex: [5,6]

additionalReducedInteractions : list of pairs of column numbers

For regressing on covariate-column interactions, pairs of columns to use as additional (two-way, first-order) interactions for reduced model covariates. Ex: [[2,3],[3,4]]

bonferroni : bool, default: True

Whether or not to use Bonferroni adjustment

fdr : bool, default: True

Whether or not to calculate False Discovery Rate

singleValuePermutations : bool, default: False

Whether or not to perform single value permutation tests

fullScanPermutations : bool, default: False

Whether or not to perform full scan permutation tests

numPermutations : integer, >=3, default: 0

The number of permutations to be used for the selected number of permutation tests.

outputPPQQ : bool, default: False

Whether to output data for P-P/Q-Q plots

outputNegLog : bool, default: True

Whether or not to output -log10 P values

outputDetailedResults : list of criteria

Indicates that detailed output should be included if the criteria is satisfied. Some threshold values may only be available for certain models and regressions.

Returns:

spreadsheets : a list of spreadsheet objects of length 3 as follows

  • 0: Regression results spreadsheet
  • 1: Residual Spreadsheet
  • 2: Detailed regression results viewer

Examples

>>> myResultsSSList = mySS.numericRegression()
>>> 
spreadsheet.numRowsState(state)

This command returns the number of rows of the specified row state in the current spreadsheet object.

Parameters:

state : integer, (one of below)

Returns:

numberOfState : integer

The number of rows of the specified state

Examples

>>> mySS.numRowsState(ghi.const.StateInactive)
20
>>> 
spreadsheet.permuteRows()

This command randomly permutes the rows in the current spreadsheet object.

spreadsheet.plotColumns(**args)

This command takes a column index or list of indexes for a numeric column and plots numeric value plots. The X-axis corresponds to row labels and the Y-axis corresponds to the values in the numeric column.

Parameters:

plot *one* column only : :

col : integer

Integer column number to plot

plot *multiple* columns : :

cols : list of integers

List of column indexes to plot

oneItemPerGraph : bool, default: True

Whether or not to plot one item per graph. If False all items are plotted in one graph

Examples

>>> mySS.plotColumns(1)
>>> mySS.plotColumns([1,2,5], oneItemPerGraph = True)
>>> 
spreadsheet.plotDependents(col, oneItemPerGraph)

This command takes a column index for a numeric column and plots XY scatter plots. The independent variable (X-axis) corresponds to the column specified and the dependent variable(s) are all columns whose states are set as dependent.

Parameters:

col : integer

Column number of independent variable

oneItemPerGraph : bool, default: True

Whether or not to plot one item per graph. If False all items are plotted in one graph

Examples

>>> mySS.setColState([1,5,6,8], ghi.const.StateDependent)
>>> mySS.plotDependents(2, oneItemPerGraph = False)
>>> 
spreadsheet.plotHeatMap()
This command plots a Heat Map for the active numeric values in the
spreadsheet. There are no parameters to specify. If the spreadsheet is marker mapped then the heat map will be plotted with the markers along the X-axis regardless of the orientation of the markers in the spreadsheet. If there is no marker map applied to the spreadsheet then the columns will be plotted along the X-Axis on a uniform scale.

Examples

>>> mySS.plotHeatMap()
>>> 
spreadsheet.plotHistograms(*args)

This command takes a column index or list of indexes for a numeric column and plots histograms. The X-axis corresponds to the values in the numeric column and the Y-axis corresponds to frequency counts for each histogram bin.

Parameters:

plot *one* column only : :

col : integer

Integer column number to plot

plot *multiple* columns : :

cols : list of integers

List of column indexes to plot

oneItemPerGraph : bool, default: True

Whether or not to plot one item per graph. If False all items are plotted in one graph

Examples

>>> mySS.plotHistograms(1)
>>> mySS.plotHistograms([1,2,5], oneItemPerGraph = True)
>>> 
spreadsheet.plotLD()
This command plots LD for the active genotype columns in the spreadsheet.
There are no parameters to specify. If the spreadsheet is marker mapped then the columns will be plotted based on genetic distance. Otherwise, the columns will be plotted based on a uniform scale.

Examples

>>> mySS.plotLD()
>>> 
spreadsheet.plotXY(independentCol, depCols, oneItemPerGraph)

This command takes a column index for the independent column and plots XY scatter plots for each specified dependent column. The independent variable (X-axis) corresponds to the column specified and the dependent variable(s) are all columns specified in the dependent column list.

Parameters:

independentCol : integer

Column number for the independent variable (X-axis)

depCols : list of integers

List of column numbers for the dependent variables (Y-axes)

oneItemPerGraph : bool, default: True

Whether or not to plot one item per graph. If False all items are plotted in one graph.

See also

plotDependents

Examples

>>> mySS.plotXY(2, [1,5,6,8], oneItemPerGraph = False)
>>> 
spreadsheet.recodeGenotypes(**kws)

This command takes the data from a genotypic spreadsheet and recodes it according to the selected encoding, outputting a new spreadsheet with the recoded data.

Parameters:

convertTo : string

The encoding to use for the recoded data.

The following are valid values for this parameter:

  • “RefAlt” :

    Recode to A_A, A_r, or r_r based on whether the genotype has two, one, or zero alternate alleles. Requires the markerMapField parameter. The allAltsAlike parameter may also be used.

  • “MajorMinor” :

    Recode to D_D, D_d, or d_d based on whether the genotype has two, one, or zero minor alleles. If the spreadsheet has a marker map that includes both a Major and Minor allele field then these fields are used for recoding. Otherwise, Major vs. minor alleles are determined from the data.

  • “Numeric” :

    Recode to a numeric score based on the genetic model. The score will be based on either alternate allele counts or minor allele counts, depending on whether or not the markerMapField parameter is specified. The geneticModel parameter is required to be used for this encoding. If the markerMapField parameter is used, the allAltsAlike parameter may also be used.

  • “FlippedGenotypes” :

    Flip DNA strands for AGCT-encoded genotypes.

  • “AGCT” :

    Transcode AB-encoding to AGCT-encoding using a marker map field (specified by the markerMapField parameter) in format ‘A/B’.

  • “MappedAlleles” :

    Transcode using a marker map field (specified by the markerMapField parameter) containing an allele mapping in the format ‘A:G B:T’. If necessary, an error spreadsheet will also be generated that specifies columns that did not get recoded and why they did not.

geneticModel : integer, (one of below)

The genetic model to be used for recoding when convertTo is “Numeric”:

markerMapField : string

The marker map field to be used for recoding.

  • If convertTo is “RefAlt” or “Numeric”, this field should specify the reference allele.
  • If convertTo is “AGCT”, this field should be in format ‘A/B’, where allele “A” should be converted to the first letter and allele “B” should be converted to the second letter.
  • If convertTo is “MappedAlleles”, this field should be in format ‘A:G B:T’, where the allele name specified before any colon should be transformed to the allele name after that colon.

allAltsAlike : bool, default: False

Set this parameter to True to treat any alternate allele the same, even if the column has multiple alternate alleles.

To be used with parameter markerMapField when convertTo is either “RefAlt” or “Numeric”.

dropNonRecoded : bool, default: False

Whether or not to drop spreadsheet columns that cannot be recoded (rather than copy them to the new spreadsheet).

childOfCurrent : bool, default: True

Whether or not to create the recoded spreadsheet as a child of the current spreadsheet.

Returns:

spreadsheet(s) :

A list which always contains the recoded spreadsheet as its first member. When convertTo is “MappedAlleles”, the list will also contain the error spreadsheet, if there is one.

If using “RefAlt” and an Alternates field is not present in the marker map one will be created.

If using “MajorMinor” and Major Allele and Minor Allele fields do not already exist in the marker map (if one is applied to the spreadsheet) these fields will be created.

Examples

>>> myRecodeList = mySS.recodeGenotypes(convertTo = "RefAlt",
                        markerMapField = "Reference", 
                        allAltsAlike = False,
                        childOfCurrent = False, 
                        dropNonRecoded = True)
>>> myRecodeList = mySS.recodeGenotypes(convertTo = "MajorMinor")
>>> myRecodeList = mySS.recodeGenotypes(convertTo = "Numeric",
                        geneticModel = ghi.const.ModelAdditive,
                        markerMapField = "Reference", 
                        allAltsAlike = True)
>>> # Uses Major/Minor
>>> myRecodeList = mySS.recodeGenotypes(convertTo = "Numeric",     
                        geneticModel = ghi.const.ModelDominant)
>>> myRecodeList = mySS.recodeGenotypes(convertTo = "FlippedGenotypes")
>>> myRecodeList = mySS.recodeGenotypes(convertTo = "AGCT",
                        markerMapField = "Reference Alleles A/B")
>>> myRecodeErrList = mySS.recodeGenotypes(convertTo = "MappedAlleles",
                                markerMapField = "alleleMap")
spreadsheet.repr()

Return a description of the current spreadsheet.

Returns:spreadsheetDescription : string
spreadsheet.rohGWAS(**kws)
This command tests for runs of homozygosity in the current spreadsheet
object. This version of roh has options more appropriate to GWAS data.
Parameters:

minLengthKBase : double, >= 1.0, default: 500

Specify a run based on length in kilo-base pairs

minSnpsKBase : integer, >= 2, default: 25

Sets the minimum number of SNPs in a run as determined by the length in k-bp. Requires minLengthKBase.

minLengthSnps : integer, >= 2, default: 100

The minimum number of SNPS that constitute a run.

minSamples : integer, >= 1, default: 20

Minimum number of samples that must contain a run to define a cluster of runs. Requires createClusterSheet = True.

allowHetero : bool, default: True

Whether or not heterozygotes can be included in a run.

maxHetero : integer, >= 1, default: 1

Maximum number of allowed heterozygotes. Requires allowHetero = True.

useHeteroDensity : bool, default: False

Whether or not to restrict the number of heterozygotes within a set window size.

heteroDensity : integer, >= 1, default: 1

Maximum number of heterozygotes allowed in a set window size.

heteroDensityRange : integer, >= 2, default: 5

Size of the window in which to restrict the number of heterozygotes.

useHeteroConsec : bool, default: False

Whether or not to restrict the number of heterozygotes that appear in consecutive order.

maxHeteroConsec : integer, >= 1, default: 1

Maximum number of heterozygotes that can appear in consecutive order.

restrictMissing : bool, default: True

Whether or not the number of allowed missing genotypes in a run should be restricted.

maxMissing : integer, >= 0, default: 5

Maximum number of allowed missing genotypes in a run. Requires restrictMissing = True.

useMissingDensity : bool, default: False

Whether or not to restrict the number of missing genotypes within a set window size.

missingDensity : integer, >= 1, default: 1

Maximum number of missing genotypes allowed in a set window size.

missingDensityRange : integer, >= 2, default: 5

Size of the window in which to restrict the number of missing genotypes.

useMissingConsec : bool, default: False

Whether or not to restrict the number of missing genotypes that appear in consecutive order.

maxMissingConsec : integer, >= 1, default: 1

Maximum number of missing genotypes that can appear in consecutive order.

restrictGap : bool, default: True

Whether or not the maximum distance between SNPs in a run should be restricted.

maxGap : double, >= 1.0, default: 100

The maximum gap between SNPs in a run. Requires restrictGap = True.

restrictDensity : bool, default: False

Whether or not the minimum density of SNPs in a run should be restricted.

minDensity : double, >= 1.0, default: 50

The minimum density of SNPs in a run. Requires restrictDensity = True.

createRunsSheet : bool, default: True

Whether or not to output a spreadsheet containing each homozygous run per sample.

createBinarySheet : bool, default: True

Whether or not to output a spreadsheet containing the incidence of common runs per SNP.

createIncidenceSheet : bool, default: True

Whether or not to output a spreadsheet containing binary ROH run

status.

createClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing summary info for clusters of runs if clusters were found.

createFirstColClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing cluster info for the first column of each cluster.

createEveryColClusterSheet : bool, default: False

Whether or not to output a spreadsheet containing cluster info for every column of each cluster.

createOptimalClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing

summary info for optimal clusters of runs.

createFirstOptimalClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing cluster info for the first column of each optimal cluster

createEveryOptimalClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing cluster info for every column of each optimal cluster.

createHaploClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing summary info for haplotypic similarity clusters of runs.

createFirstHaploClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing cluster info for the first column of each haplotypic similarity cluster.

createEveryHaploClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing cluster info for every column of each haplotypic similarity cluster.

Returns:

spreadsheets : a list of spreadsheet objects of length 12 as follows

  • 0: Homozygous runs summary spreadsheet
  • 1: Cluster of runs summary spreadsheet
  • 2: Cluster of runs spreadsheet (first column of each cluster)
  • 3: Cluster of runs spreadsheet (every column of each cluster)
  • 4: Optimal cluster of runs summary spreadsheet
  • 5: Optimal cluster of runs spreadsheet (first column of each cluster)
  • 6: Optimal cluster of runs spreadsheet (every column of each cluster)
  • 7: Haplotypic similarity cluster of runs summary spreadsheet
  • 8: Haplotypic similarity cluster of runs spreadsheet (first column of each cluster)
  • 9: Haplotypic similarity cluster of runs spreadsheet (every column of each cluster)
  • 10: Spreadsheet containing the incidence of common runs per SNP
  • 11: Spreadsheet containing the binary ROH run status

Examples

>>> mySSList = mySS.rohGWAS(minLengthSnps = 100)
>>> 
spreadsheet.rohNGS(**kws)
This command tests for runs of homozygosity in the current spreadsheet
object. This version of roh has options more appropriate for NGS data.
Parameters:

minLengthKBase : double, >= 1.0, default: 500

Specify a run based on length in kilo-base pairs

minSnpsKBase : integer, >= 2, default: 25

Sets the minimum number of SNPs in a run as determined by the length in k-bp. Requires minLengthKBase.

minLengthSnps : integer, >= 2, default: 100

The minimum number of SNPS that constitute a run.

minSamples : integer, >= 1, default: 20

Minimum number of samples that must contain a run to define a cluster of runs. Requires createClusterSheet = True.

allowHetero : bool, default: True

Whether or not heterozygotes can be included in a run.

maxHetero : integer, >= 1, default: 1

Maximum number of allowed heterozygotes. Requires allowHetero = True.

useHeteroDensity : bool, default: False

Whether or not to restrict the number of heterozygotes within a set window size.

heteroDensity : integer, >= 1, default: 1

Maximum number of heterozygotes allowed in a set window size.

heteroDensityRange : integer, >= 2, default: 5

Size of the window in which to restrict the number of heterozygotes.

useHeteroConsec : bool, default: False

Whether or not to restrict the number of heterozygotes that appear in consecutive order.

maxHeteroConsec : integer, >= 1, default: 1

Maximum number of heterozygotes that can appear in consecutive order.

treatMissingAsHomo : bool, default: False

Whether or not to treat missing genotypes as homozygous reference calls.

restrictMissing : bool, default: True

Whether or not the number of allowed missing genotypes in a run should be restricted.

maxMissing : integer, >= 0, default: 5

Maximum number of allowed missing genotypes in a run. Requires restrictMissing = True.

useMissingDensity : bool, default: False

Whether or not to restrict the number of missing genotypes within a set window size.

missingDensity : integer, >= 1, default: 1

Maximum number of missing genotypes allowed in a set window size.

missingDensityRange : integer, >= 2, default: 5

Size of the window in which to restrict the number of missing genotypes.

useMissingConsec : bool, default: False

Whether or not to restrict the number of missing genotypes that appear in consecutive order.

maxMissingConsec : integer, >= 1, default: 1

Maximum number of missing genotypes that can appear in consecutive order.

createRunsSheet : bool, default: True

Whether or not to output a spreadsheet containing each homozygous run per sample.

createBinarySheet : bool, default: True

Whether or not to output a spreadsheet containing the incidence of common runs per SNP.

createIncidenceSheet : bool, default: True

Whether or not to output a spreadsheet containing binary ROH run status.

createClusterSheet: bool, `default: True` :

Whether or not to output a spreadsheet containing summary info for clusters of runs if clusters were found.

createFirstColClusterSheet : bool, default: True

Whether or not to output a spreadsheet containing cluster info for the first column of each cluster.

createEveryColClusterSheet : bool, default: False

Whether or not to output a spreadsheet containing cluster info for every column of each cluster.

createOptimalClusterSheet: bool, `default: True` :

Whether or not to output a spreadsheet containing summary info for optimal clusters of runs.

createFirstOptimalClusterSheet: bool, `default: True` :

Whether or not to output a spreadsheet containing cluster info for the first column of each optimal cluster

createEveryOptimalClusterSheet: bool, `default: True` :

Whether or not to output a spreadsheet containing cluster info for every column of each optimal cluster.

createHaploClusterSheet: bool, `default: True` :

Whether or not to output a spreadsheet containing summary info for haplotypic similarity clusters of runs.

createFirstHaploClusterSheet: bool, `default: True` :

Whether or not to output a spreadsheet containing cluster info for the first column of each haplotypic similarity cluster.

createEveryHaploClusterSheet: bool, `default: True` :

Whether or not to output a spreadsheet containing cluster info for every column of each haplotypic similarity cluster.

Returns:

spreadsheets : a list of spreadsheet objects of length 12 as follows

  • 0: Homozygous runs summary spreadsheet
  • 1: Cluster of runs summary spreadsheet
  • 2: Cluster of runs spreadsheet (first column of each cluster)
  • 3: Cluster of runs spreadsheet (every column of each cluster)
  • 4: Optimal cluster of runs summary spreadsheet
  • 5: Optimal cluster of runs spreadsheet (first column of each cluster)
  • 6: Optimal cluster of runs spreadsheet (every column of each cluster)
  • 7: Haplotypic similarity cluster of runs summary spreadsheet
  • 8: Haplotypic similarity cluster of runs spreadsheet (first column of each cluster)
  • 9: Haplotypic similarity cluster of runs spreadsheet (every column of each cluster)
  • 10: Spreadsheet containing the incidence of common runs per SNP
  • 11: Spreadsheet containing the binary ROH run status

Examples

>>> mySSList = mySS.rohNGS(minLengthSnps = 100)
>>> 
spreadsheet.row(rowNumber, state)

Returns a list of elements in the current spreadsheet object from the specified row number (1-based).

Parameters:

rowNumber : integer, >= 0

The row number of desired row, 0 = column headers.

state : integer, (one of below), optional

Returns:

row : list of values

See also

zrow, datamodel.row

Examples

>>> row = mySS.zrow(3, ghi.const.StateDependent)
>>> row
[76.868]
>>> 
spreadsheet.zrow(rowNumber, state)

This command returns a list of elements in the current spreadsheet object from the specified row number (0-based).

Parameters:

rowNumber : integer, >= 0

The row number of desired row, 0 = first row of data.

state : integer, (one of below), optional

Returns:

row : list of values

See also

row, datamodel.row

Examples

>>> row = mySS.zrow(2, ghi.const.StateDependent)
>>> row
[76.868]
>>> 
spreadsheet.rowIndexes(state)

Return a list of 1-based row indexes for a given state.

Parameters:

state : integer, (one of below)

Returns:

rowIdxs : list of row indexes

Examples

>>> rowIndexes = mySS.rowIndexes(2, ghi.const.StateActive)
>>> 
spreadsheet.rowLabels(state)

This command returns a list of the row labels.

Parameters:

state : integer, (one of below)

Returns:

rowLabels : list of row labels

Examples

>>> rowLabels = mySS.rowLabels(ghi.const.StateActive)
>>> 
spreadsheet.rowState(row)

This command takes a row index as a parameter and returns the state of the specified row.

Parameters:

row : integer, >= 1

Row number

Returns:

rowState : integer, (one of below)

Examples

>>> rowState = mySS.rowState(18)
0
>>> rowState = mySS.rowState(20)
1
>>> 
spreadsheet.rowSubset()

Creates a new spreadsheet object from the active rows of the current spreadsheet object.

Returns:subsetSpreadsheet : Spreadsheet Object
spreadsheet.savePED()

This will export spreadsheet data in PED/MAP format. This function is equivalent to the File > Save As... > PED/TPED/BED feature.

Examples

>>> mySS.savePED()
>>> 
spreadsheet.selectRowsByColumnBoolean(colIdx, value)

This command takes the specified binary column number and activates and inactivates rows based on the value in the column for the current spreadsheet object.

Parameters:

colIdx : integer, >= 1

The binary column number

value : bool, (one of below), optional

Indicates the binary value for activating rows
  • 0: Activate rows with 0’s, inactivate rows with 1 or missing
  • 1: Activate rows with 1’s, inactivate rows with 0 or missing

Examples

>>> mySS.selectRowsByColumnBoolean(1,1)
>>> 
spreadsheet.selectRowsByColumnValue(colIdx, threshold, lessThanFlag)

Deprecated since version 7.6.7: selectAllRowsByColumnValue()

This command activates or inactivates rows based on threshold values from the specified column from the current spreadsheet object.

Parameters:

colIdx : integer, >= 1

The column number to use for activation.

threshold : int or double

The threshold value.

lessThanFlag : bool, default: True, optional

Whether or not to activate all rows less than or equal to the threshold. False activates all rows greater than or equal to the threshold.

Examples

>>> mySS.selectRowsByColumnValue(1,36.35,True)
>>> 
spreadsheet.selectAllRowsByColumnValue(colIdx, threshType, threshold)

This command activates or inactivates rows based on threshold values from the specified column from the current spreadsheet object.

Parameters:

colIdx : integer, >= 1

The column number to use for selection. This must be a numeric (integer or real) column.

threshType: int :

The type of selection threshold.

  • ghi.const.LessThan

    The column value is compared with the threshold, and if the column value is less than the threshold, the row is activated or left active. Otherwise, the row is inactivated or left inactive.

  • ghi.const.LessOrEqual

  • ghi.const.GreaterThan

  • ghi.const.GreaterOrEqual

  • ghi.const.Equal

  • ghi.const.NotEqual

threshold : int or double

The threshold value.

Examples

>>> mySS.selectAllRowsByColumnValue(2, ghi.const.LessOrEqual, 5)
>>> 
spreadsheet.selectActiveRowsByColumnValue(colIdx, threshType, threshold)

This command performs selection on active rows based on threshold values from the specified column from the current spreadsheet object.

NOTE: This command will not re-activate inactive rows that would otherwise have been selected. Please see selectAllRowsByColumnValue() to select over all spreadsheet rows.

Parameters:

colIdx : integer, >= 1

The column number to use for selection. This must be a numeric (integer or real) column.

threshType: int :

The type of selection threshold.

  • ghi.const.LessThan

    The column value is compared with the threshold, and if the column value is less than the threshold, the (active) row is left active. Otherwise, the (active) row is inactivated. (Inactive rows are left inactive.)

  • ghi.const.LessOrEqual

  • ghi.const.GreaterThan

  • ghi.const.GreaterOrEqual

  • ghi.const.Equal

  • ghi.const.NotEqual

threshold : int or double

The threshold value.

Examples

>>> mySS.selectActiveRowsByColumnValue(2, ghi.const.GreaterThan, 34.2)
>>> mySS.selectActiveRowsByColumnValue(2, ghi.const.LessThan, 36.8)
>>> 
>>> # The above examples, if both done in sequence, will select
>>> # previously-active rows where the column value is in a range
>>> # between 34.2 and 36.8.
>>> 
spreadsheet.selectAllRowsByCategories(colIdx, categories)

This command activates or inactivates rows based on categories as specified by the selected column from the current spreadsheet object.

Parameters:

colIdx : integer, >= 1

The column number to use for selection. This must be either a binary, a categorical, or a genotypic column.

categories : list of strings

List of categories for rows that are to be activated or left active. All other rows are inactivated or left inactive.

NOTE: Not every category in this list need correspond to a value that occurs within column colIdx.

Examples

>>> mySS.selectAllRowsByCategories(2, ["1", "?"])    # Binary column
>>> mySS.selectAllRowsByCategories(3, ["T", "N"])    # Categorical column
>>> mySS.selectAllRowsByCategories(4, ["G_G", "T_T"])# Genotypic column
>>> 
>>> # NOTE: Each selection above overrides all previous
>>> # selections. If all of the above examples are done in
>>> # sequence, the end result will simply be that all rows with
>>> # either "G_G" or "T_T" in column 4 will be active and the
>>> # remainder of the rows will be inactive.
spreadsheet.selectActiveRowsByCategories(colIdx, categories)

This command performs selection on active rows based on categories as specified by the selected column from the current spreadsheet object.

NOTE: This command will not re-activate inactive rows that would otherwise have been selected. Please see selectAllRowsByCategories() to select over all spreadsheet rows.

Parameters:

colIdx : integer, >= 1

The column number to use for selection. This must be either a binary, a categorical, or a genotypic column.

categories : list of strings

List of categories for active rows that are to be left active. The other active rows are inactivated. (Inactive rows are left inactive.)

NOTE: Not every category in this list need correspond to a value that occurs within column colIdx.

Examples

>>> mySS.selectActiveRowsByCategories(2, ["0", "?"])        # Binary column
>>> mySS.selectActiveRowsByCategories(3, ["T", "Tb"])       # Categorical column
>>> mySS.selectActiveRowsByCategories(4, ["G_G", "G_T"])    # Genotypic column
>>> 
>>> # NOTE: Each selection above works on the rows that have
>>> # already been active. If all of the above examples are done
>>> # in sequence, the only rows remaining active will be those
>>> # that were already active to begin with that have a control
>>> # or missing value in column two, a "T" or "Tb" in column
>>> # three, AND either a "G_G" or a "G_T" in column four.
spreadsheet.setColState(columnIdx, state)

This command sets the specified column or list of columns to the specified state in the current spreadsheet object.

Parameters:

columnIdx : integer or list of integers

A column index or a list of column indexes

state : integer, (one of below)

Indicates what state to set the column(s) to.

Examples

>>> mySS.setColState([2,3,4,5,20], ghi.const.StateInactive)
>>> mySS.setColState(20, ghi.const.StateActive)
>>> 
spreadsheet.setMarkerMap(tabular, **kws)

This sets a marker map from another dataModel or spreadsheet to the current spreadsheet object.

Parameters:

tabular : Spreadsheet or DataModel Object

The object that contains the marker map to set as the map for the current spreadsheet.

columnOriented : bool, default: True

Whether or not the marker names are column name headers.

offset : integer, >= 0, default: 0

The offset in the current spreadsheet for the marker map.

Examples

>>> mySS.setMarkerMap(myMappedSS, columnOriented = True, offset = 1)
>>> 
spreadsheet.setRowState(*args)

This command sets the specified row or rows to the specified state in the current spreadsheet object.

There are three ways to use this command:
  1. setRowState(rowIdx, state): set state of a single row
  2. setRowState(firstRow, lastRow, state): set state of a range of rows
  3. setRowState(rowList, state): set state of a list of rows
Parameters:

set state of a *single* row :

row : integer, >= 1

A row number

set state of a *range* of rows :

firstRow : integer >= 1

First row number

lastRow : integer >= 1

Last row number

set state of a *list* of rows :

rowList : list of integers

A list of row numbers

state : integer, (one of below)

Indicates whether to activate or inactivate the row(s)

Examples

>>> mySS.setRowState(range(1,mySS.numRows()+1), ghi.const.StateInactive)
>>> mySS.setRowState(1, ghi.const.StateActive)
>>> mySS.setRowState([2,5,19,25,47], ghi.const.StateActive)
>>> 
spreadsheet.setRowStateRandom(numRandomRows, select)

This command sets the randomly selected number of rows to the specified state in the current spreadsheet object.

Parameters:

numRandomRows : integer, >= 1

Number of rows to select and set the state of

select : integer, (one of below)

Set the state of the randomly selected rows to the chosen state

Examples

>>> mySS.setRowStateRandom(500, ghi.const.StateActive)
>>> 
spreadsheet.sortByColAscending(colNumber)

This command sorts the current spreadsheet object by arranging the values of the specified column number in ascending order.

Parameters:

colNumber : integer

Number of the column to sort the spreadsheet by

Examples

>>> mySS.sortByColAscending(20)
>>> 
spreadsheet.sortByColDescending(colNumber)

This command sorts the current spreadsheet object by arranging the values of the specified column number in descending order.

Parameters:

colNumber : integer

Number of the column to sort the spreadsheet by

Examples

>>> mySS.sortByColDescending(22)
>>> 
spreadsheet.sortByCustomOrder(sortOrder)

This command sorts the rows in the specified order. The required parameter is a list of the row numbers in the desired sort order. This list must be the same length as the number of rows in the spreadsheet.

Parameters:

sortOrder : list of integers

List of row numbers to sort the spreadsheet by.

spreadsheet.sortColIdx()

Returns the column number last used for sorting.

Returns:

sortOrder : integer, (one of below)

  • 0: sorted by row labels
  • -1: not sorted
  • -2: sorted by custom order
spreadsheet.sortDirection()

If the spreadsheet was sorted, returns the sort direction last used. Should be used in conjunction with sortColIdx().

Returns:

lastSort : integer, (one of below)

  • 0: last sort was in ascending order
  • 1: last sort was in descending order
spreadsheet.stopLogging()

The output in the Node Change Log can be suppressed by using this command for the current spreadsheet object. Logging will resume after using the resumeLogging() command.

spreadsheet.transpose(newDatasetName, colType, *kws)

This command transposes all columns of the same specified column type in the current spreadsheet object.

Parameters:

newDatasetName : string

The name for the transposed dataset.

colType : integer, (one of below)

The column type to transpose

activeDataOnly : bool, default: True

Whether or not only active data should be transposed.

childOfCurrent : bool, default: True

Whether or not the transposed spreadsheet should be created as a child of the current spreadsheet.

labelHeader : string, default: ‘Columns’

The label for the column name headers

memoryLimit : integer, >= 64, default: current transpose cache size

The maximum amount of memory to be used while transposing.

Returns:

spreadsheet : Spreadsheet Object

Examples

>>> mySS.transpose("My SS Transposed", ghi.const.DataTypeGenotypic)
spreadsheet.unsort()

Returns the current spreadsheet object in the original row order.

Tabular Module

This module contains functions that act on a tabular object.

tabular.getColType(column)

Deprecated: see colType

Parameters:column : int
Returns:value : int
tabular.colType(column)

This command takes a column index as a parameter and returns the type of the specified column.

The possible column types are as follows:

  • ghi.const.TypeBinary = binary
  • ghi.const.TypeInteger = integer
  • ghi.const.TypeReal = real
  • ghi.const.TypeCategorical = categorical
  • ghi.const.TypeGenotypic = genotypic
Parameters:column : int
Returns:value : int
tabular.colHeader(colNumber)

Returns the column header for the given column.

Returns:header : column header

Examples

>>> myDatamodel.colHeader(1)
'weight (lbs)'
>>> myDatamodel.cell(0, 1)
'weight (lbs)'
tabular.findCol(colName)

This command searches for a column in the current spreadsheet object whose column name header is specified.

The column index of the first column found is returned. If no column name header matches the provided string the value -1 will be returned.

The parameter for this command is: * colName: a string column name header

Parameters:colName : string
Returns:value : int
tabular.findCols(colNames)

This command searches for columns in the current spreadsheet object whose column name headers are specified.

The column index of the columns found are returned in a list. If no column name headers match any of the strings provided, an empty list () will be returned.

The parameter for this command is: * colNames: a list of string column name headers

Parameters:colNames : strings
Returns:value : list
tabular.findRow(rowLabel)

This command searches for a row in the current spreadsheet object whose row label is specified.

The row index of the first row found is returned. If no row label matches the provided string the value -1 will be returned.

The parameter for this command is: * rowLabel: a string row label

Parameters:rowLabel : string
Returns:value : int
tabular.findRows(rowLabels)

This command searches for rows in the current spreadsheet object whose row labels are specified.

The row index of the rows found are returned in a list. If no row labels match any of the strings provided, an empty list () will be returned.

The parameter for this command is: * rowLabels: a list of string row labels

Parameters:rowLabels : strings
Returns:value : list
tabular.numRows()

This command returns the number of rows (not including the column name header row) in the current spreadsheet object.

Returns:value : int
tabular.numCols()

This command returns the number of columns (not including the row label column) in the current spreadsheet object.

Returns:value : int
tabular.hasMarkerMap()

This command indicates if a marker map is applied to the spreadsheet object.

Returns:value : bool
tabular.getMarkerMapChromosome()

Deprecated: see markerMapChromosomes

Returns:value : list
tabular.markerMapChromosomes()

This command returns a list of the chromosome information for the spreadsheet from the applied marker map.

Returns:value : list
tabular.getMarkerMapPosition()

Deprecated: see markerMapPositions

Returns:value : list
tabular.markerMapPositions()

This command returns a list of the position information for the spreadsheet from the applied marker map.

Returns:value : list
tabular.getMarkerMapField(fieldNumber)

Deprecated: see markerMapField

Parameters:fieldNumber : int
Returns:value : list
tabular.markerMapField(fieldNumber)

This command returns a list of data for the given marker map field number from the applied marker map.

Parameters:fieldNumber : int
Returns:value : list
tabular.getMarkerMapFieldNames()

Deprecated: see markerMapFieldNames

Returns:value : list
tabular.markerMapFieldNames()

This command returns a list containing the names of the fields contained in the applied marker map. The names are listed in the order in which they appear in the marker map.

Returns:value : list
tabular.getMarkerMapFieldTypes()

Deprecated: see markerMapFieldTypes

Returns:value : list
tabular.markerMapFieldTypes()

This command returns a list containing the type of each column contained in the applied marker map. The types are listed in the order in which they appear in the marker map.

The returned types may take on the following values: * 1 = integer * 2 = real * 3 = categorical

Returns:value : list
tabular.getMarkerMapFieldCell(fieldNumber, columnNumber)

Deprecated: see markerMapFieldCell

Parameters:

fieldNumber : int

columnNumber : index

Returns:

value : variant

tabular.markerMapFieldCell(fieldNumber, columnNumber)

This command returns the data from the applied marker map for the given field at the given column number.

Note that accessing data for many cells using this method can be time consuming. If many cells need to be accessed, it is faster to get the field as a whole, and access data for all columns in that field before moving on to the next field.

Parameters:

fieldNumber : int

columnNumber : index

Returns:

value : variant

tabular.getMarkerMapOffset()

Deprecated: see markerMapOffset

Returns:value : int
tabular.markerMapOffset()

This command returns the the offset of the mapped markers in the spreadsheet. If the first column (or row, in a row-mapped spreadsheet) is mapped, 0 is returned. For example, if there are 5 unmapped columns (or rows, in a row-mapped spreadsheet) before the first mapped one, 5 will be returned.

Returns:value : int
tabular.getMarkerMapOrientation()

Deprecated: see markerMapOrientation

Returns:value : int
tabular.markerMapOrientation()

This command returns the orientation of the marker map. It will return ghi.const.MapOrientationColumns if the map is oriented along the columns of the spreadsheet and ghi.const.MapOrientationRows if the map is oriented along rows. If there is no marker map applied, ghi.const.MapOrientationNone will be returned.

Returns:value : int
tabular.getOrderedChrList()

Deprecated: see orderedChrList

Returns:value : int
tabular.orderedChrList()

This command returns an ordered list of chromosomes that are present in the marker map.

Returns:value : int
tabular.computeLD(column1, column2, method, statistic, imputeMissings, maxIters, convTolerance)

This function returns the LD value for column1 and column2.

Note that these keyword arguments are only applicable when using the EM algorithm.

Parameters:

column1 : int

Indicates the column number for the first column

column2 : int

Indicates the column number for the second column

method : int

Indicates whether to use the EM or CHM algorithm

  • ghi.const.ImputeCHM CHM algorithm
  • ghi.const.ImputeEM EM algorithm

statistic : int, optional

Indicates whether to return R-squared or D-prime

  • ghi.const.StatRSquared R-squared
  • ghi.const.StatDPrime D-prime

imputeMissings : bool, optional

Indicates whether or not to impute missing values

  • False (default) do not impute missing values
  • True impute missing values

maxIters : int, optional

Indicates the maximum number of EM iterations performed regardless of reaching convergence

  • (default) 50
  • any integer >= 1

convTolerance : real, optional

Indicates the desired tolerance that the EM algorithm will use to determine convergence

  • (default) 0.0001
  • any floating point value >= 0
Returns:

value : real

tabular.computeLDStats(column1, column2)

This function returns three CHM LD statistics for column1 and column2.

Parameters:

column1 : int

Indicates the column number for the first column

column2 : int

Indicates the column number for the second column

Returns:

(RSquared, signedD, DPrime) : tuple of reals

CHM R Squared, the CHM signed D statistic, and CHM D Prime for the two columns are returned. The signed D statistic is between the minor alleles of the two columns. If either column has more than two alleles, a zero is returned for the signed D statistic.

tabular.genotypeAlleleCounts(colIdx)

This function returns list of pairs of allele names and the count of those alleles in a genotypic column. The missing allele would always be the last in the list regardless of its count.

For example, if a genotypic column had the values:

['A_A', 'A_B', 'A_A', 'B_B', '?_?']

Calling this function of that column would return:

[['A',5], ['B',3], ['?', 2]]

For bi-allelic columns, the major allele will always be the first and the minor allele the second in the list.

Parameters:colIdx : int
Returns:value : list
tabular.genotypeColumnData(colIdx, model, refAltFieldNum, multiAltOK)

Returns a list of the numerically parsed values of a genotypic column, with a missing value indicator (None) returned for every genotype.

NOTE: If the column has more than two distinct alleles, or (in the case of using reference/alternate alleles) more than one non-reference allele, and you have not allowed multiple alternate alleles or multiple minor alleles, a list consisting of all missing value indicators (None) will be returned.

Parameters:

colIdx : integer

The column number from which to return data

model : integer, (one of below), default: ghi.const.ModelAdditive

The genetic model by which to code the genotypes:

where d=major allele, D = minor allele, or r = reference allele, A = alternate allele.

refAltFieldNum : integer, default: 0

The field number of the reference field to obtain genotype codes by reference allele/alternate allele.

NOTE: To code by major allele/minor allele, use zero.

multiAltOK : bool, default: False

Whether or not it is OK to have multiple alternate alleles or multiple minor alleles.

Returns:

column : list of doubles

list containing genotype values according to the current model and/or missing value indicators (None) (see NOTE above)

Examples

>>> col1Vals = myDatamodel.genotypeColumnData(1)
>>> #
>>> # Straightforward case. Additive model ("dosage") data is obtained 
>>> # from column 1 with 0's returned for homozygous major allele 
>>> # genotypes. If there are more than two distinct alleles in this
>>> # marker, all values returned will be missing value
>>> # indicators (`None`).
>>>
>>> col3Vals = myDatamodel.genotypeColumnData(3, ghi.const.ModelDominant, 2, True)
>>> #
>>> # Dominant model data is obtained from column 3, with marker map 
>>> # field 2 being used to indicate the reference allele. Even if there
>>> # is more than one alternate allele in this marker, values (zero if
>>> # both alleles are reference, one otherwise) will be returned unless
>>> # the genotype is actually a missing-value genotype (`?_?`).
>>> 
tabular.rowLabel(rowNumber)

Returns a row label for the given row number.

Returns:rowLabel : a row label

Examples

>>> myDatamodel.rowLabel(1)
'Sample 1'
>>> myDatamodel.cell(1, 0)
'Sample 1'
>>> 

Datamodel Module

This module contains commands that act on a datamodel object. The commands listed under Tabular Module will also act on a datamodel object.

datamodel.cell(rowNumber, colNumber)

Obtains the value from the specified cell in the datamodel.

Parameters:

rowNumber : integer, >= 0

The row number for desired row, 0 = column name headers

colNumber : integer, >= 0

The column number for desired column, 0 = row labels

Returns:

cellValue : various types

Examples

>>> value = myDatamodel.cell(1,3)
>>> value
'A_G'
>>> 
datamodel.col(colNumber)

Returns all data except the column name header from the specified column (1-based index) from the current datamodel object in a Python list.

Parameters:

colNumber : integer, >= 0

The column number of desired column, 0 = row labels

Returns:

column : list of single value type

list of data from a column, by design columns will have only data with the same type in them.

Examples

>>> column = myDatamodel.col(5)
>>> column[0:5]
[62.385, 65.5234, ?, 76.868]
>>> 
datamodel.zcol(colNumber)

Returns all data except the column name header from the specified column (0-based index) from the current datamodel object in a Python list.

Parameters:

colNumber : integer, >= 0

The column number of desired column, 0 = first column of data

Returns:

column : list of single value type

list of data from a column, by default columns will have only data with the same type in them.

Examples

>>> zcolumn = myDatamodel.zcol(4)
>>> zcolumn[0:5]
[62.385, 65.5234, ?, 76.868]
>>> 
datamodel.colHeaders()

Returns a Python list of the column headers.

Returns:columnHeaders : list of column headers

Examples

>>> colHeads = myDatamodel.colHeaders()
>>> colHeads[0:5]
['weight (lbs)', 'height (in)', 'age', 'Lab 1']
>>> 
datamodel.row(rowNumber)

Returns a Python list of elements in the current datamodel object from the specified row number (1-based).

Parameters:

rowNumber : integer, >= 0

The row number of desired row, 0 = column headers

Returns:

row : list of elements from specified row, various types

Examples

>>> row = myDatamodel.row(3)
>>> row[0:5]
[17.370, 1, 'Low','A_B']
>>> 
datamodel.zrow(rowNumber)

Returns a Python list of elements in the current datamodel object from the specified row number (0-based).

Parameters:

rowNumber : integer, >= 0

The row number of desired row, 0 = first row of data

Returns:

row : list of elements from specified row, various types

Examples

>>> zrow = myDatamodel.zrow(2)
>>> zrow[0:5]
[17.370, 1, 'Low','A_B']
>>> 
datamodel.rowLabels()

Returns a Python list of the row labels for the datamodel object.

Returns:rowLabels : list of row labels

Examples

>>> rowLabs = myDatamodel.rowLabels()
>>> rowLabs[0:5]
['Sample 1', 'Sample 2', 'Sample 3', 'Sample 4']
>>> 
datamodel.subsetColumns(colSubset)

Creates a new datamodel object based on the subset of columns provided in the 1-based list of indexes.

Parameters:

colSubset : integers, >= 0

The column numbers of desired columns

Returns:

subsetDatamodel : PyDataModel object based on column subset

Examples

>>> selectCols = range(1,myDatamodel.numCols()+1,2)
>>> selectCols[0:6]
[1, 3, 5, 7, 9, 11]
>>> subsetDatamodel = myDatamodel.subsetColumns(selectCols)
>>> 
datamodel.subsetRows(rowSubset)

Creates a new datamodel object based on the subset of rows provided in the 1-based list of indexes.

Parameters:

rowSubset : integers >= 0

The row numbers of desired rows

Returns:

subsetDatamodel : PyDataModel object based on row subset

Examples

>>> selectRows = range(2, myDatamodel.numRows()+1,3)
>>> selectRows[0:5]
[2, 5, 8, 11]
>>> subsetDatamodel = myDatamodel.subsetRows(selectRows)
>>> 
datamodel.subsetSpreadsheet()

Creates and returns a subset spreadsheet based the columns and rows in the current datamodel. The subset spreadsheet will be a child of the spreadsheet this datamodel was based on.

Returns:spreadsheet : PySpreadsheet Object

Examples

>>> subsetSS = myDatamodel.subsetSpreadsheet()
>>> 
datamodel.colIndex(colNumber)

Returns the index of a given column in the original spreadsheet associated with the current datamodel. Because a datamodel can contain a subset of the columns from the original spreadsheet, this function allows mapping back to the original spreadsheet indexes for changing their state and other actions.

Parameters:

colNumber : integer, >= 0

The column number of the current datamodel column, 0 = row label column

Returns:

index : integer

The index of the specified column in the original spreadsheet

Examples

>>> ss = ghi.getCurrentObject()
>>> ss.setColState(range(1,ss.numCols()+1,2), ghi.const.StateInactive)
>>> myDatamodel = ss.dataModel()
>>> index = myDatamodel.colIndex(5)
>>> index
10
>>> 
datamodel.rowIndexes()

Returns the row indexes in the original spreadsheet for all the rows in the current datamodel.

Returns:indexes : integers

Examples

>>> indexes = myDatamodel.rowIndexes()
>>> 
datamodel.colIndexes()

Returns the column indexes in the original spreadsheet for all of the columns in the datamodel.

Returns:indexes : integers

Examples

>>> indexes = myDatamodel.colIndexes()
>>> 
datamodel.hasPedFields()

This command indicates whether the current spreadsheet object contains pedigree columns.

Returns:

success : bool

Whether or not the spreadsheet contains pedigree columns

Examples :

>>> myPedDataModel.hasPedFields() :

True :

>>> myOtherSS.hasPedFields() :

False :

>>> :

Dataset Builder Module

This module contains commands that act on a dataset builder object.

datasetbuilder.addBoolColumn(name, data)

Adds a column of binary values to the new dataset.

Parameters:

name : string

Column name header for the new column

data : bools

Column of binary data, length of column is identical to number of rows specified when dataset builder was created

datasetbuilder.addCategoricalColumn(name, data)

This command adds a column of categorical or string values to the new dataset.

Parameters:

name : string

Column name header for the new column

data : strings

Column of categorical strings, length of column is identical to number of rows specified when dataset builder was created

datasetbuilder.addColumn(name, data, defaultType)

Adds a column of values to the new dataset.

The type of column is auto-detected and can also be numpy arrays. If all the data is missing, then defaultType specifies the type of the new column.

Parameters:

name : string

Column name header for the new column

data : python or numpy array of uniformly typed values

defaultType: integer, `default: ghi.const.DataTypeBinary`, optional :

datasetbuilder.addGenotypicColumn(name, data)

Adds a column of genotypic string values (allele values separated by an underscore such as A_B) to the new dataset.

Parameters:

name : string

Column name header for the new column

data : strings

Column of genotypic strings, length of column is identical to number of rows specified when dataset builder was created

datasetbuilder.addGeneticColumn(name, data)

Deprecated: see addGenotypicColumn

datasetbuilder.addIntColumn(name, data)

Adds a column of integer values to the new dataset.

Parameters:

name : string

Column name header for the new column

data : integers

Column of integer values, length of column is identical to number of rows specified when dataset builder was created

datasetbuilder.addIntegerColumn(name, data)

Adds a column of integer values to the new dataset.

See also

addIntColumn

datasetbuilder.addRealColumn(name, data, doublePrecision)

Adds a column of real values to the new dataset. The real values can either be added as single-precision floats or as double-precision floating point values.

Parameters:

name : string

Column name header for the new column

data : doubles

Column of real values, either single- or double- precision, length of column is identical to number of rows specified when dataset builder was created

doublePrecision : bool, default: True, optional

Whether or not to store values as double-precision floating points

datasetbuilder.addRowLabels(name, data)

Adds a column of row labels to the new dataset.

Parameters:

name : string

Row label header for the new column of row labels

data : strings

Column of row labels, length of column is identical to number of rows specified when dataset builder was created

datasetbuilder.addMarkerMap(mapName, chrs, positions, offset, orientation)

Adds a marker map to the dataset. The marker map will have the name mapName (if saved outside the spreadsheet) and contain the chromosome and position information provided. Once a map is applied, no other columns can be added.

Parameters:

mapName : string

The marker map name

chrs : strings

The chromosome names for each marker

positions : integers

Position in chromosome for each marker

offset : integer, default: 0

The 0-based index where the mapped markers begin

orientation : integer (one of below)

Indicates whether markers are along rows or columns
datasetbuilder.addMapStringField(name, data)

Adds a string field called name to the marker map for this dataset.

Parameters:

name : string

data : strings

datasetbuilder.addMapIntField(name, data)

Adds an integer field called name to the marker map for this dataset.

Parameters:

name : string

data : integers

datasetbuilder.addMapRealField(name, data)

Adds a realfield called name to the marker map for this dataset.

Parameters:

name : string

data : doubles

datasetbuilder.finish(parent)

Adds a new dataset and spreadsheet to the Project Navigator Window, and returns a new spreadsheet object.

Parameters:

parent : integer, optional

Specifies the ID of the parent node for the new spreadsheet

Returns:

value : PySpreadsheet

datasetbuilder.isValid()

Checks to see if the dataset builder object is still valid.

Returns:

value : bool (one of below)

  • True: dataset builder object is still valid, more columns can be added
  • False: dataset builder object is not valid, no more columns can be added

Marker Map Builder by Marker Module

This module contains commands that act on a marker map builder by marker object.

markermapbuilderbymarker.addMarker(marker, chr, position, extraFields)

This command adds a single marker with the given name, chr, position and list of field data.

Parameters:

marker : string

Name for the marker

chr : string

The chromosome for the marker

position : int

The position for the marker

extraFields : variable list

A list containing data for each map field. The types and length of the list must match the types and length specified when this builder was started.

Returns:

success : bool

markermapbuilderbymarker.finish(forProjectUseOnly=False)

Finish the started marker map.

By default, when forProjectUseOnly is False, moves the completed marker map DSM file to the Genetic Marker Maps folder.

When forProjectUseOnly is True, the completed marker map file is left in the project tmp folder. The completed marker map may then be used as the mapping for a newly-built (or other) spreadsheet by performing spreadsheet.setMarkerMap, using this marker map builder as the first parameter for that command.

This command returns the name of the marker map file that was written.

Parameters:

forProjectUseOnly : bool, optional

Indicates if marker map is for internal project use

  • False (default) move to Genetic Marker Maps folder
  • True use for setMarkerMap as project-internal marker map
Returns:

fileName : string

This command returns the name of the marker map file that was written.

Marker Map Builder by Field Module

This module contains commands that act on the marker map builder by field module.

markermapbuilderbyfield.addIntField(name, data)

This command adds an optional integer field to the marker map.

Parameters:

name : string

Name for the marker map integer field

data : ints

List of integers to add

Returns:

success : bool

markermapbuilderbyfield.addRealField(name, data)

This command adds an optional field of real values, double-precision, to the marker map.

Parameters:

name : string

Name for the marker map real field

data : doubles

List of reals to add

Returns:

success : bool

markermapbuilderbyfield.addStringField(name, data)

This command adds an optional field of strings to the marker map.

Parameters:

name : string

Name for the marker map string field

data : strings

List of strings to add

Returns:

success : bool

markermapbuilderbyfield.finish(forProjectUseOnly=False)

Finish the started marker map.

By default, when forProjectUseOnly is False, moves the completed marker map DSM file to the Genetic Marker Maps folder.

When forProjectUseOnly is True, the completed marker map file is left in the project tmp folder. The completed marker map may then be used as the mapping for a newly-built (or other) spreadsheet by performing spreadsheet.setMarkerMap, using this marker map builder as the first parameter for that command.

This command returns the name of the marker map file that was written.

Parameters:

forProjectUseOnly : bool, optional

Indicates if marker map is for internal project use

  • False (default) move to Genetic Marker Maps folder
  • True use for setMarkerMap as project-internal marker map
Returns:

fileName : string

This command returns the name of the marker map file that was written.

Result Viewer Module

This module contains commands that act on a result viewer object. The commands listed under Navigator Node Module will also act on a result viewer object.

Result viewers can display a monospaced text content like a log file, custom HTML or structed data through using the add[*] functions below.

Custom Result Viewer

Example custom structed data Result Viewer

resultviewer.asText()

Returns the data currently in the result viewer as a string. Unless the result viewer holds plain text, the text returned will be in HTML format.

Returns:value : string
resultviewer.asData()

Returns the data currently in the result viewer as a python dictionary or list of dictionaries. If there are multiple titles in the viewer, a list of dicts, one for each title, will be returned. 2-column tables are entered as key-value pairs. Tables with more than 2 columns are inserted as matrices (2-dimensional lists) under the key ‘tableN’ where N is the table number in increasing order starting at 1.

Returns:value : dict or list[dict]
resultviewer.addHeader(header)

Add a primary header to a structured data result-holder

Parameters:

header: string :

Header label

Examples

>>> rv = ghi.createResultViewer("Custom Result Viewer")
>>> rv.addHeader("Custom Results")
>>> 
resultviewer.addSubHeader(header)

Add a sub-header to a structured data result-holder

Parameters:

header: string :

Header label

Examples

>>> rv = ghi.createResultViewer("Custom Result Viewer")
>>> rv.addSubHeader("Some some header")
>>> 
resultviewer.addPairedList(pairedList)

Add a list of key/value pairs to a structured data result-holder.

The values can be a python integer, float or string.

Parameters:

pairedList: list 2-value lists :

Each list item should be a 2-tuple of a string and displayable value

Examples

>>> rv = ghi.createResultViewer("Custom Result Viewer")
>>> rv.addPairedList([["Some Stat",0.9999],["Some Param","Param Value"],
                      ["Count",10]])
>>> 
resultviewer.addTable(headers, matrix)

Add a table with headers to a structured data result-holder

The headers should be a list of strings and the values be a 2-d table of displayable primitive values (integer, float, string etc).

Parameters:

header: string :

Header label

matrix: list of list of values :

Although not being strict (we simply auto-format this to HTML table constructs), the out list should contain a set of inner lists of uniform length.

Examples

>>> rv = ghi.createResultViewer("Custom Result Viewer")
>>> rv.addTable(["Labels","Stat1","Stat2"],[["Row 1",0.25,10],
                ["Row 2",0.75,11],["Row 3",0.99,-5]])
>>> 
resultviewer.addList(header, items)

Add a list of items with a header to a structured data result-holder

Parameters:

header: string :

Header label

items : list of values

Values can be of type integer, float or string

Examples

>>> rv = ghi.createResultViewer("Custom Result Viewer")
>>> rv.addList("Detected Chromosomes",["1","2","X","Y"])
>>> 

Genome Map Module

This module contains commands that act on a genome map object.

genomemap.clear()

Each item in the returned list corresponds to one track. Information included for each track is: * track name * a short description including a short track name and source * coordinate system including authority, data type and species * track type such as ‘cyto’, ‘gene’, etc.

genomemap.title()

Returns the title text.

Returns:value : string
genomemap.setTitle(title)

Changes the title text.

Parameters:

title : string

The new title text

genomemap.taxId()

Returns the taxonomy ID of the species for this map, if available.

Returns:value : string
genomemap.setTaxId(taxId)

Changes the taxonomy ID for this map.

Parameters:

taxId : string

The new taxonomy ID

genomemap.coordSysId()

Returns the Coordinate System ID for this track. SVS recognizes many IDs defined by http://www.dasregistry.org/.

Returns:value : string
genomemap.setCoordSysId(coordSysId)

Changes the Coordinate System ID.

Parameters:

coordSysId : string

The new taxonomy ID

genomemap.authority()

Returns the authority component of the Coordinate System ID. In the example ‘NCBI_36,Chromosome,Homo sapiens’ the authority is ‘NCBI_36’.

Returns:value : string
genomemap.type()

Returns the type component of the Coordinate System ID. In the example ‘NCBI_36,Chromosome,Homo sapiens’ the type is ‘Chromosome’.

Returns:value : string
genomemap.species()

Returns the species component of the Coordinate System ID. In the example ‘NCBI_36,Chromosome,Homo sapiens’ the species is ‘Homo sapiens’.

Returns:value : string
genomemap.commonNames()

Returns the list of common names available to describe this map.

Returns:value : strings
genomemap.addCommonName(name)

Appends a new common name to this map.

Parameters:name : string
genomemap.clearCommonNames()

Removes all names from the list of common names.

genomemap.date()

Returns the date this map was created.

Returns:value : string
genomemap.setDate(date)

Updates the date this map was created.

Parameters:

date : string

The new date for this map

genomemap.saveXml(filename)

Writes this genome map to an XML file.

Parameters:filename : string
Returns:value : bool
genomemap.md5()

Returns the MD5 hash that represents this map.

Returns:value : string

Genome Browser Module

This module contains commands that act on a genome browser object.

genomebrowser.openForRead(url)

Opens a track from reading, and returns a reader object.

The url parameter should be a file name or a file name with the :<source_id> appended. The file name may be either a relative or an absolute file name. A relative file name will be understood to be in the annotations folder for the program. Absolute filenames (such as ‘C:/temp/testFile.tsf’) can be used as well. For convenience, certain file locations can be described using macros:

  • ‘%DATAPATH%’ - References the annotation folder in the user data directory.
  • ‘%SYSTEMPATH%’ - References the genome maps folder in the application’s installation directory.

To reference a file in the system directory, for example: ‘%SYSTEMPATH%/RefSeqGenes-UCSC_GRCh_37_Homo_sapiens.tsf’.

Parameters:

url : string

The name of an IDF file top open followed by ”:” and the source ID such as ‘RefSeqGenes-UCSC_GRCh_37_Homo_sapiens.tsf:1’, or a network URL.

Returns:

reader : TrackReader object

See also

createTrack

Examples

>>> filebase = "rfs://data.goldenhelix.com/rfs/"
>>> file = "RefSeqGenes63-UCSC_2014-02-16_GRCh_37_Homo_sapiens.tsf:1"
>>> reader = ghi.genomebrowser.openForRead(filebase + file)
>>> reader
trackreader(rfs://data.goldenhelix.com:80/rfstest/RefSeqGenes63
-UCSC_2014-02-16_GRCh_37_Homo_sapiens.tsf:1)
>>> 
genomebrowser.createTrack(url, title, type, coordSysId, schemaDesc, **docArgs)

Creates a new track and returns a writer object for the track.

The url parameter should be a file name or a file name with the :<source_id> appended. The file name may be either a relative or an absolute file name. A relative file name will be understood to be in the annotations folder for the program. Absolute filenames (such as ‘C:/temp/testFile.tsf’) can be used as well. For convenience, certain file locations can be described using macros:

  • ‘%DATAPATH%’ - References the annotation folder in the user data

    directory.

  • ‘%SYSTEMPATH%’ - References the genome maps folder in the application’s

    installation directory. (Note: do no write here)

If no source ID is provided, the next one in sequence will be used for files that have existing tracks (1 by default).

The title of the track is the name to be displayed when tracks are listed.

The schema parameter is a tab delimited set of field descriptions which specify the name and data type of the field.

The coordSysId parameter defines the coordinate system identifier. For example ‘GRCh_37,Chromosome,Homo sapiens’.

The optional keyword arguments are supported to provide extra documentation about the source.

Parameters:

url : string

The name of an IDF file in the program annotations directory such as ‘RefSeqGenes-UCSC_GRCh_37_Homo_sapiens.tsf’ or a literal path, such as ‘c:/temp/output.tsf’

title : string

The title (name) of the track

type : string

The data type identifier, which influences the drawing of the track. One of [‘generic’, ‘cyto’, ‘probe’, ‘variant’, ‘gene’, ‘intensity’, ‘char sequence’]

coordSysId : string

The coordinate system identifier, i.e. ‘GRCh_37,Chromosome, Homo sapiens’

schemaDesc : string

A data format description of the data to be written to the track. Each interval written to the track must follow the same format. Fields in the schema are tab delimited, and describe the name and data type for each interval, in the form ‘[name]=[type]’.

The types supported are:
  • ? Boolean
  • b Byte
  • s String (NULL terminated)
  • e Enum (categorical)
  • i Integer
  • f Single Precision Real
  • f4 Single Precision Real
  • f8 Double Precision Real

Note that for enum fields, you still pass in strings, but an error will be thrown if more than 200 unique strings are seen for an enum field.

An example schema describing two fields, one called ‘Name’, of a string, and the other called ‘Value’, of a single precision real number would be the string: 'Name=s      Value=f'.

A type can also be described as a list type, using the ‘@’ symbol prior to the type, i.e. 'Values=@f'.

If schemaDesc is empty, a default schema based on type is used.

curatedBy : string (optional)

Name of user or organization. I.e. ‘Golden Helix, Inc.’

seriesName : string (optional)

Short string to represent that this source is a versioned series. For example ‘1kG’ is the series name of the 1000 genome variant source. The The Data Source Library only shows the latest curated source of a given series by default.

sourceVersion : string (optional)

The user-facing name of the version for this series. If the series was dbSNP, then the version string might be ‘138’ for dbSNP build 138. Other sources without official naming can just use the date they were released, such as 2014-01-01.

descriptionHtml : string (optional)

HTML description of the source.

sourceCreditHtml : string (optional)

HTML crediting the source. Please hyperlinks to original content, paper citations here.

curationNotesHtml : string (optional)

HTML notes on any transformations or choices made while curating the source.

headerLines : string list (optional)

List of lines of meta-data, derived form the header of the source. For text files, this may be the lines starting with ‘#’

fieldDocs : dict of string to strings (optional)

Optional documentation strings for fields. The keys of this dictionary are the field names, and the values are the strings to be used to document the fields.

fieldUrls : dict of string to strings (optional)

Optional urlTemplate strings for fields. The keys of this dictionary are the field names, and the values are the strings to be used as the URL template for the value of the field. For example, RSID fields should have URL template of http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=$$

Returns:

writer : TrackWriter object

See also

openForRead

Examples

>>> myWriter = ghi.genomebrowser.createTrack('%DATAPATH%/track.tsf', 
               'Track Title', 'generic', 'GRCh_37,Chromosome,Homo sapiens', 
               'Value=f')
>>> myWriter = ghi.genomebrowser.createTrack('conservationTrack.tsf', 
               'Mammalian Conservation', 'Intensity', 
               'GRCh_37,Chromosome,Homo sapiens', 'Conservation=f', 
               descriptionHtml='<p>A <b>Conservation</b> track</p>')
>>> myWriter
trackwriter(~/.local/Golden Helix SVS/Annotations/conservationTrack.tsf:1)
>>>
genomebrowser.createTabular(url, title, schemaDesc, **docArgs)

Creates a new tabular, non-plottable data source.

The title of the track is the name to be displayed when tracks are listed.

The schema parameter is a tab delimited set of field descriptions which specify the name and data type of the field.

The TrackWriter returned from this call should only have the writeRecord method call, as writeFeature is designed for tracks (createTrack) sources that have genomic coordinates.

Returns:writer : TrackWriter object

Examples

>>> myWriter = ghi.genomebrowser.createTabular('%DATAPATH%/table.tsf', 
               'Track Title', 'Value=f')
>>> myWriter.writeRecord([1.234])
>>>
genomebrowser.deleteTrack(url)

Removes the track from the file, or removes the file if it is the last track in the file.

Parameters:

url : string

The name of a file in the program annotations directory such as ‘RefSeqGenes-UCSC_GRCh_37_Homo_sapiens.tsf’ or a literal path, such as ‘c:/temp/output.tsf’

Examples

>>>ghi.genomebrowser.deleteTrack(“%DATAPATH%/track.tsf”) >>>

genomebrowser.localSourceList()

Returns a list of all the tracks found in the project annotation directory.

Each item in the returned list corresponds to one track. Information included for each track is:

  • track name
  • a short description including a short track name and source
  • coordinate system including authority, data type and species
  • track type such as ‘cyto’, ‘gene’, etc.
Returns:value : list

Examples

>>> myList = ghi.genomebrowser.localSourceList()
>>> myList
[[u'C:/Users/Guest/Documents/GHI/test2/cmh test/annotations/track.tsf:1', 
  u'Track Title', u'GRCh_37,Chromosome,Homo sapiens', u'Interval'], 
 [u'C:/Users/Guest/Documents/GHI/test2/cmh test/annotations/track.tsf:2', 
  u'Track Title', u'GRCh_37,Chromosome,Homo sapiens', u'Interval']]
>>> 
genomebrowser.defaultReferenceUrl(build=None, allowNetworkUrl=False)

Returns the URL of the default reference sequence track for the given build.

If the coordinate system of build is empty, the project or global default is used.

The following order of precedence is followed if multiple sources are available:

  • The user annotation folder
  • Other local folders added to the Data Source Library
  • The public annotation repository

NOTE: By default no URL is returned if the only reference sequence source is network based. You can override this by setting the second parameter to True

Returns:url : string

Examples

>>> ghi.genomebrowser.defaultReferenceUrl('GRCh_37,Chromosome,Homo sapiens')
'/Volumes/MicroDrive/Annotations/ReferenceSequence-UCSC_GRCh_37_Homo_sapiens.tsf:1'
>>>
genomebrowser.defaultCytobandUrl(build=None)

Returns the URL of the default cytobands track for the given build.

If the coordinate system of build is empty, the project or global default is used.

The following order of precedence is followed if multiple sources are available:

  • The user annotation folder
  • Other local folders added to the Data Source Library
  • The public annotation repository
Returns:url : string

Examples

>>> ghi.genomebrowser.defaultCytobandUrl('GRCh_37,Chromosome,Homo sapiens')
'/Users/me/Library/Application Support/Golden Helix SVS/Annotations/Cytobands
2009-06-12-UCSC_GRCh_37_Homo_sapiens.tsf:1'
>>> 
genomebrowser.defaultGeneUrl(build=None)

Returns the URL of the default gene track for the given build.

If the coordinate system of build is empty, the project or global default is used.

The following order of precedence is followed if multiple sources are available:

  • The user annotation folder
  • Other local folders added to the Data Source Library
  • The public annotation repository

With the preferred folder, if there multiple gene tracks, the RefSeq gene series is preferred and the latest curated track of that series.

Returns:url : string

Examples

>>> ghi.genomebrowser.defaultGeneUrl()
'/Users/grudy/Library/Application Support/Golden Helix SVS/Annotations/
RefSeqGenes63-UCSC_2014-02-16_GRCh_37_Homo_sapiens.tsf:1'
>>> 
genomebrowser.requiresPrecompute(url)

Checks whether the given url requires some precompute or indexing before being able to be read.

Parameters:

url : string

The name of a source file

Returns:

message: string :

If empty, no precompute is necessary. Otherwise a status message returned.

See also

runPrecompute

genomebrowser.runPrecompute(url, message, options)

Run the source precompute engine for a given source with a specified url.

VCF files for example require being bgzip compressed and tabix indexed before reading. This will thus run that compression step. Similarly it will create a BAI index for BAM sources.

Parameters:

url : string

The name of a source file to run precompute on.

message : string

The message to display to the user in the progress dialog.

removeUncompressed: bool, optional :

If true, removes the uncompressed file if we have compressed a file during the precompute.

writeToPath: string, optional :

If set, new files will be written to specified path instead of beside the source.

Returns:

wasNotCanceled : bool

genomebrowser.runAutoPrecompute(url, indexSymbolList=[])

Run the set of auto computations for a given source with a specified url.

This will include indexing and coverage computations for most sources. For newly written TSF files, this includes indexing some fields by default for gene tracks and can optionally include indexing the fields with the field symbols specified in indexSymbolList.

Parameters:

url : string

The name of a source file to run precompute on.

indexSymbolList : stringlist

List of field symbols to index. Must be string fields

Returns:

wasNotCanceled : bool

False if was canceled. Returns true also when no computations are necessary.

See also

runPrecompute

Examples

>>> #Only index gene name and transcript ID fields to make them searchable
>>> ghi.genomebrowser.runAutoPrecompute('C:/Users/lname/AppData/Local/Golden Helix/' +        'Common Data/Annotations/RefSeqGenes105-NCBI_2013-08-20_GRCh_37_g1k_Homo_sapiens.tsf:1')
>>>  #Index gene name, transcript ID, MIM, and CCDS fields to make them searchable
>>> ghi.genomebrowser.runAutoPrecompute('C:/Users/lname/AppData/Local/Golden Helix/' +        'Common Data/Annotations/RefSeqGenes105-NCBI_2013-08-20_GRCh_37_g1k_Homo_sapiens.tsf:1',
    ['MIM','CCDS'])
>>> True
>>> 

Track Reader Module

This module contains commands that act on a track reader object.

trackreader.coordSysId()

Returns the coordinate system of the track, i.e. ‘GRCh_37,Chromosome,Homo sapiens’.

Returns:

coordSys : string

Coordinate system of the track

Examples

>>> idString = myReader.coordSysId()
>>>
>>> idString
u'GRCh_37,Chromosome,Homo sapiens'
>>> 
trackreader.coverageSpace()

This command returns a list of the genomic segments, one segment per chromosome. Each segment is a list of with three elements: [chromosome, first index, last index].

Returns:genomicSegments : list of segments

Examples

>>> regionList = myReader.coverageSpace()
>>> 
>>> regionList[0:2]
[[u'1', 1,  247249719], [u'2', 1, 242951149]]
>>> 
trackreader.fieldIndexMap()

The field index map is a dictionary of field names to the index within a feature.

Each call to next() returns a feature as a python list containing all the fields contained in the track or some subset the reader may be configured for.

Returns:

fieldNames : dictionary

Dictionary of field names to an index in the list returned from next()

See also

nextFeatureSet

Examples

>>> myMap = myReader.fieldIndexMap()
>>> 
trackreader.fieldList()

Returns a list of the field names in the order they are defined in the track’s schema.

Returns:

names : list of strings

List of the field names

Examples

>>> tracklist = myReader.fieldList()
>>> 
trackreader.fieldType(index)

Returns a string in the syntax of the track schema definition for the field specified by index.

Parameters:

index : integer

Field index from the field list.

Returns:

fieldType : string, examples below

  • i: integer
  • s: string
  • f: float
  • @i: list of integers
  • @s: list of strings
  • @f: list of floats
  • zs: compressed string

Examples

>>> fieldType = myReader.fieldType(3)
>>> 
trackreader.hasNext()

Command indicates if the track reader has another a valid feature.

Returns:

success : bool

Whether or not next() will return a valid feature

trackreader.indexOf(fieldName)

Returns the zero-based index of field name if it is valid, -1 otherwise.

Parameters:

fieldName : string

Name of the field to retrieve the index for

Returns:

index : integer

Zero-based index of the field name, if valid

Examples

>>> idx = myReader.indexOf('Name')
>>> 
trackreader.intervalMode()

Returns the interval mode for the track reader.

Returns:

mode : integer (one of below)

Examples

>>> myReader.intervalMode()
1
>>> 
trackreader.setIntervalMode(mode)

Sets the interval mode for the track reader.

Parameters:

mode : integer (one of below)

  • ghi.const.IntervalModeIndexed: Intervals are in indexed coordinates. Indexed coordinates are one-based, and the width of an interval is one plus the difference between the stop and start positions.
  • ghi.const.IntervalModeHalfOpen: Intervals are in half-open coordinates. Half-open coordinates are zreo-based and the difference between the stop and start positions define the width of an interval (Default).

Examples

>>> myReader.setIntervalMode(ghi.const.IntervalModeIndexed)
>>> 
trackreader.next()

Returns the next feature in the track, sorted by the genomic position of the feature.

Each feature is returned in the form of an API list containing all or some of the fields defined in the track schema. The content and field ordering of a feature is in corresponding order to the track schema, but can be modified by the field list parameter to read().

Returns:

feature : list

list of field values for the next feature

See also

schema, read

Examples

>>> feature = myReader.next()
>>> 
trackreader.nextFeatureSet()

Returns a block of features together.

A useful attribute of this query is that any two features taht overlap will be grouped together. It should not be assumed, however, that all features in a set will be mutually overlapping, or necessarily overlap any other features.

Returns:

featureList : list

List of a block of features from the next grouping

Examples

>>> featureList = myReader.nextFeatureSet()
>>> 
trackreader.percentDone()

Returns an integer in the range [0,100] representing the percent progress through the current read. Successive calls to next() will increase the value returned by percentDone().

The estimate may be inaccurate, as data may not be uniformally distributed throughout the genomic space of the track.

Returns:

percent : integer

An estimate of the amount of genomic space covered by all features returned by next() or nextFeatureSet()

Examples

>>> value = myReader.percentDone()
>>> 
trackreader.read(*args)

Configures reader to begin reading data in the described genomic space. This API can be specified in multiple ways:

  1. read(): read the entire track
  2. read(chromosome, startPosition, stopPosition, fieldList): read a single region
  3. read(regionList, fieldList): read multiple regions
Parameters:

For a *single region* :

chromosome : string

Chromosome number or name as a string

startPosition : integer

Start position for the region

stopPosition : integer

End position for the region

fieldList : list of strings

Ordered list of names of fields specifying the structure of features returned by next()

For *multiple regions* :

regionList : list of tuples

List of regions, where each region contains at least a chromosome identifier, but can also contain start and stop positions, i.e. [[‘1’],[‘2’,300,400]]

fieldList : list of strings

Ordered list of names of fields specifying the structure of features returned by next()

Examples

>>> myReader.read()
>>> myReader.read('1', 10000, 20000, ['Gene Name', 'Transcript Name'])
>>> myReader.read([['1', 10000, 20000], ['1', 30000, 40000]],
                   ['Gene Name','Transcript Name'])
>>> 
trackreader.schema()

Returns the definition of the track’s data schema.

A schema is a tab delimited set of field descriptions which specify the name and data type of the field. A descriptor such as ‘Value=f’ specifies a single field, called ‘Value’ that is single precision real.

Returns:

fieldDescriptor : string (one of below)

  • ? - Boolean
  • b - Byte
  • s - string (null terminated)
  • i - integer (4 bytes)
  • f - single precision real (4 bytes)
  • f4 - single precision real (4 bytes)
  • f8 - double precision real (8 bytes)

Examples

>>> myReader.schema()
u'Gene Name=s\tTranscript Name=s\tStrand=s\tCDS Start=i\tCDS Stop=i\tExon Count=i\t
Exon Starts=@i\tExon Stops=@i\tCds Start Stat=s\tCds Stop Stat=s'
>>> 
trackreader.readMode()

Returns the read mode of the track, either ‘overlap’ or ‘exact’.

Returns:

readMode : string, one of below

See also

setReadMode

trackreader.setReadMode(mode)

Sets the mode of the track reader.

Parameters:

mode : integer (one of below)

Indicates the new mode of the reader
  • ghi.const.ReadModeOverlap: reads all features that overlap the requested regions, or any features that by association overlap the requested intervals. (default)
  • ghi.const.ReadModeExact: read only features that exactly fit the requested regions. This mode performs faster in some cases.

Examples

>>> myReader.setReadMode(ghi.const.ReadModeExact)
>>> 
trackreader.title()

Returns the title of the track.

Returns:trackTitle : string

Examples

>>> myReader.title()
u'RefSeq Genes, UCSC'
>>> 
trackreader.type()

Returns the type specifier used by the visualization system to determine how to draw the track’s data.

Returns:

descriptor : string (one of below)

  • Probe
  • Cytoband
  • Gene
  • Intensity
  • Allele Sequence
  • Generic

See also

schema

Examples

>>> myReader.type()
u'Gene'
>>> 
trackreader.uuid()

Returns the Universally Unique Identifier.

This field is useful within the application to identify multiple instances of the same track.

Returns:uuid : string

Examples

>>> myReader.uuid()
u'{f9749b38-68d0-4586-9f38-09d73b112495}'
>>> 
trackreader.write(writer)

Writes the remaining intervals of the current read to the writer.

Parameters:

writer : PyGBTrackWriter

A writer object, created by ghi.genomebrowser.openForWrite(), or ghi.genomebrowser.createTrack()

See also

writeFeature

Examples

>>> myReader.write(myWriter)
>>> 

Track Writer Module

This module contains commands that take data and write a TSF File.

trackwriter.flush()

Blocks execution until all data has been writen to disk.

trackwriter.finish()

Finish writing the file. If you don’t call this, it is called automatically when the writer goes out of scope.

Returns:url : url of new source
trackwriter.intervalMode()

Returns the interval mode for the track writer.

Returns:

mode : integer (one of below)

Examples

>>> myWriter.intervalMode()
1
>>>
trackwriter.setIntervalMode(mode)

Sets the interval mode for the track writer.

Parameters:

mode : integer (one of below)

  • ghi.const.IntervalModeIndexed: Intervals are in indexed coordinates. Indexed coordinates are one-based, and the width of an interval is one plus the difference between the stop and start positions.
  • ghi.const.IntervalModeHalfOpen: Intervals are in half-open coordinates. Half-open coordinates are zreo-based and the difference between the stop and start positions define the width of an interval (Default).

Examples

>>> myWriter.setIntervalMode(ghi.const.IntervalModeIndexed)
>>> 
trackwriter.writeFeature(chr, start, stop, data)

Writes a feature to the track.

Parameters:

chr : string

The chromosome to which the feature belongs

start : integer

The start position of the feature interval

stop : integer

The stop position of the feature interval

data : value or list of values

If the data is a single value, the data schema for this track must have a single element (such as Intensity tracks, which take a single real value). Lists of data should be ordered according to the data schema.

Notes

Start and stop are 1-based position index values, thus intervals of width 1 (such as SNP) have equal start and stop values.

Examples

>>> myWriter.writeFeature('1',1923450,1923460, 0.34, 1)
>>> myWriter.writeFeature('1',1923450,1923460, ['p5.1','gneg'])
>>>