Count Alleles¶
This algorithm counts the number of alternate alleles in the genotype field across all of the samples.
Requirements¶
Requires a genotype (GT) sample level field.
Options¶
- Sample Grouping: Optionally takes a categorical sample level field and counts the alleles for each category. You can add these fields during the import process or use the default field such as Affection Status.
- Remove No-Calls Genotypes: By default, the # Alleles field includes no-call genotypes such as ./., meaning in general it will always be twice the number of samples. If you select this option, no-calls will reduce this value and also change the computed Allele Frequencies to match. This may make sense in multi-sample calling pipelines, but beware you may encounter situations where you have high allele frequencies simply because variants appear in only a few samples and in all other samples was considered a No-Call.
- Output Sample Names: When selected, a new Sample Names field is created that lists the names of the samples containing a variant genotype when the number of samples with this condition passes the specified threshold.
Output¶
Allele Counts: Counts of each alternate allele for each site across all samples. In most cases, there is only a single alternate and so the count is the number of observations of this allele across all chromosomes of the samples.
For example, a homozygous variant for a sample gets a count of 2, while a heterozygous genotype gets a count of 1.
Allele Frequencies: The Allele Counts divided by the total number of observed alleles (# Alleles). Missing genotypes are assumed to be bi-allelic, which adds 2 to the total.
# Alleles: Total number of observed alleles in called genotypes.
# Hets: Count of the number of heterozygous genotypes across all samples.
# HomoVar: Count of the number of homozygous (or hemizygous) non-reference called genotypes across all samples.
# Samples: Count of the number of samples that have one or more variant allele.
Sample Names: (Optional) The names of the samples containing a variant genotype (not reference or missing).
Homozygous Sample Names: (Optional) The names of the samples containing a homozygous variant genotype.
Heterozygous Sample Names: (Optional) The names of the samples containing a heterozygous variant genotype.