# Meta-Analysis¶

Golden Helix SVS Meta Analysis performs a meta-analysis of results taken from two or more studies, each study containing individual GWAS or other analysis results for multiple markers. All results for each individual marker are meta-analyzed across all studies for which the marker has valid results. (You may select the minimum number of studies for which a marker must have valid results before it will be meta-analyzed.)

Each study’s results should be contained in a separate spreadsheet (or spreadsheet tab), and may either have been output from a SVS analysis feature or have been imported into SVS from external data.

Three types of analysis outputs are provided:

- Fixed-effects analysis outputs
- Tests for heterogeneity between studies
- Random-effects analysis outputs

In addition, the actual input data as well as some intermediate values that have been computed from the individual studies will be output (unless you select otherwise).

Other features provided by Golden Helix SVS Meta Analysis include

- Genomic control for individual studies and/or the final results, and
- Strand correction.

## Meta-Analysis Overview¶

Often, it is desired to combine the results of multiple studies relating to the same phenotype in order to obtain the increased power that should be available from the increased amount of data available from multiple studies. One approach would simply be to combine the data into one study that would have a much larger sample size.

Another approach is to use meta-analysis. Meta-Analysis is a technique of combining and analyzing the data from multiple studies so as to:

- Determine if and how much of an effect is present with more certainty than can be determined from any individual study, and to
- Determine how and how much the findings vary from study to study.

To quote [Willer2010], “Meta-analysis allows for custom analysis of individual studies to conveniently account for population substructure, the presence of related individuals, study-specific covariates and many other ascertainment-related issues. It has been shown that meta-analysis of summary statistics is as efficient (in terms of statistical power) as pooling individual-level data across studies, but much less cumbersome.”

In SVS, meta-analysis may be performed for every individual marker containing valid data within multiple GWAS or other multiple-marker studies.

Note

The techniques used in this feature use a combination of the theory from [Willer2010] and [Magi2010].

## The Fixed-Effect Algorithm¶

### Inverse-Variance-Based Approach¶

This is the classic approach to meta-analysis. Effect sizes from different studies are combined in a weighted average, where the inverse of the estimated variance of the effect size from each study is used for the weighting of the effect size from that study.

#### The Algorithm¶

If is the effect size estimate for study for SNP , and is the standard error for estimating , we use

as the weighting for study at SNP .

The overall effect size for SNP may be computed as

and a variance and standard error for may be computed as

The overall p-value may be found either through the Z-score approach as

or through the approach using

where the has one degree of freedom.

Note

If we assume that the overall effect size is equivalent to a coefficient from a logistic regression, we may define an overall odds ratio , an overall lower confidence limit , and an overall upper confidence limit as

and

#### Using Odds-Ratio and Confidence-Interval Values¶

If odds-ratio and confidence-interval outputs are available from a study, Golden Helix SVS will convert these to effect size and standard error values as follows:

The log of the odds ratio is used as the effect size .

The lower and upper confidence interval endpoints and will be converted to the standard error value according to the formula

### Sample-Size-Based Approach¶

An alternative (and more flexible) approach to comparing and combining effect sizes is to use the sample-size-based approach explained in [Willer2010]. For every study and marker, the p-value, effect direction, and sample size (for weighting purposes) are provided. From these, a Z-score and an overall p-value are calculated. Neither an actual effect size nor an estimate of an effect size variance is needed.

#### The Algorithm¶

If is the sample size of study , is the p-value from study for SNP , and is the direction of effect for study at SNP , we use an intermediate statistic , defined as

to describe the effect, where represents the inverse of one minus the cumulative distribution function of the normal distribution (otherwise known as the inverse survival function of the normal distribution).

Note

With larger p-values (corresponding with smaller effects), approaches one-half (from the bottom), and approaches zero, while with smaller p-values (corresponding with larger effects), approaches zero, and becomes large in the positive direction. The separate effect direction variable fills in the remaining information about the effect, so as to make a variable which is normally distributed and centered around zero.

Using as the Z-score weight for study , we can compute the overall Z-score for SNP as

and the overall p-value as

where is one minus the probability density function of the normal distribution (otherwise known as the survival function of the normal distribution).

Note

For case/control studies, an effective sample size may be used as the sample size in the sample-size-based approach. The effective sample size is computed as

where is the number of cases and is the number of controls.

(Note that if the number of cases and controls are equal, the effective sample size is just double the number of either cases or of controls and is equal to the actual sample size.)

#### Reconciling with the Inverse Variance Approach¶

The inverse-variance formula for an overall Z-score,

with

and

may be rewritten as

Meanwhile, the overall Z-score from the sample-size-based approach is

Defining an effect-size weighting for the sample-size-based approach that sets the denominator terms of the two forms of equal to each other–that is, defining

we set the numerator terms of the two forms of equal to each other as

by defining

as the p-value-based effect size for the sample-size-based approach.

Note

Even though we have derived effect sizes and weightings that are similar in form to those used in the inverse-variance approach, we cannot really compare these directly with effect sizes and weightings imported directly from study results, since we do not know (in general) what algorithm was used to derive the p-values.

Therefore, for any single meta-analysis, it is (still) necessary to choose between sample-size-based meta-analysis vs. inverse-variance-based meta-analysis.

However, we will use our p-value-based effect sizes and sample-size-based weightings (as translated to effect size weightings) to check for heterogeneity between studies and to check for random effects.

## Heterogeneity Between Studies¶

### Testing for Heterogeneity¶

To test for consistency of the estimated effects across studies at the same SNP, two summary statistics are calculated:

Cochran’s Q, which is calculated as the weighted sum of squared differences between individual study effects and the pooled effect across studies, with the weights being those used in the pooling method:

This statistic has an approximate distribution with degrees of freedom, where denotes the number of studies for which SNP has valid results.

It has been argued that Q has too little power when there are few studies and too much power when there are a large number of studies.

The statistic, which is defined as

and describes the fraction of variation across studies that is due to heterogeneity rather than chance.

A higher level of heterogeneity between studies suggests that the random-effects outputs should be considered rather than the fixed-effects outputs.

### Random Effects Meta-Analysis¶

According to [Magi2010], “If there is heterogeneity of effects between studies, it is common to perform random-effects meta-analysis in order to correct the deflation in the variance of the fixed-effects estimate.”

The random-effects variance components, one for each marker, are computed using the DerSimonian-Laird (DL) approach, which is commonly used for meta-analysis, and which avoids having to use REML or REML-related approaches such as EMMA, etc., for every SNP (which would be excessively compute-intensive).

Using DL, the random-effects variance component for SNP is given by

and using random-effect analysis, the combined effect over all studies at marker is given by

where

The variance component is used above “to inflate the variance of the estimated allelic affect in each study.”

### Genomic Control¶

Golden Helix SVS Meta Analysis offers its own Genomic Control, both for individual studies and for the overall result. This technique consists of computing an inflation factor from the median of a test statistic divided by its expectation under the null hypothesis. (If the test statistic is distributed as with one degree of freedom, that expectation is .456 .) The variance of each SNP in the study is then inflated by the relevant Genomic Control inflation factor so that, for Genomic Control on an individual study, , and for Genomic Control on between-study variation across the meta-analysis, , where in the last expression, is the Genomic Control inflation factor over all meta-analyzed association test statistics, genome-wide.

For more information about Genomic Control elsewhere in Golden Helix SVS,
see *Correcting for Stratification by Genomic Control*.

## Strand Correction¶

SVS will perform sign and/or strand correction if the information identifying the test allele, the non-test allele, and/or the strand is available for the studies being performed. (If, for example, the test and non-test alleles for marker123 in Study Number 1 are ‘A’ and ‘T’, respectively, and the test and non-test alleles for marker123 in Study Number 2 are ‘T’ and ‘A’, respectively, with the strand being ‘+’ for marker123 in both studies, the sign of the effect for marker123 in Study Number 2 will be switched before it is used in the meta-analysis calculations, so as to agree with what the result would be if Study Number 2 were testing the same allele for marker123 that Study Number 1 is testing (allele ‘A’).)

Note

None of these fields–test allele, non-test allele, nor strand information–is required for SVS to perform meta-analysis. If all of these fields are missing, it will be assumed that no sign/strand correction is needed. If some of these fields are present and the rest are missing, Golden Helix SVS Meta Analysis will make a best effort to use the correct sign for each individual study effect size.

## Performing Meta-Analysis¶

### Data Requirements¶

Each study’s results should be contained in a separate spreadsheet (or spreadsheet tab), and may either have been output from a SVS analysis feature or have been imported into SVS from external data. Test allele, non-test allele, and strand fields may be located either in the spreadsheet’s marker map or in columns of the spreadsheet itself. Other result data must be contained in columns of the spreadsheet itself.

If you wish to propagate marker map information to the result spreadsheet, all study spreadsheets should be marker-mapped.

However, even if you do not propagate marker map information, test allele, non-test allele, and strand fields may still be contained in the marker map of any individual study spreadsheet which has a marker map.

### Usage and Parameters¶

Within an open project, and in the **Tools** menu, select
**Tools->Perform Meta-Analysis**. The initial dialog for selecting
study spreadsheets (*Meta-Analysis Initial Window*) will come up.

#### The Initial Dialog¶

Five options may additionally be specified in this window.

**Markers must have valid results in at least ___ studies**: A marker must be valid in at least two studies in order to be analyzed. However, you can require a higher number up to the number of studies you are analyzing.**Propagate marker map information from study spreadsheets**: If this option is checked, the output spreadsheet will be marker-mapped and sorted according to chromosome and position. Additionally,- Every marker’s chromosome information must agree between studies, and
- If there is disagreement about any marker’s position, you will be given an option in the dialog as to whether to continue, and if you do continue, the minimum position among all studies will be used for all markers for which there is position disagreement.

If this option is unchecked, the result spreadsheet will not be marker-mapped, and will be sorted alphabetically by marker name.

**Output expected P and -log 10(Expected P) for the final results**: These will be computed from the final p-values themselves.**Additionally output statistics from individual studies**: Otherwise (if this is unchecked), only the overall statistics will be output.**Perform Genomic Control on the entire meta-analysis (Chi-Squared dist. with 1 degree of freedom assumed)**: See*Genomic Control*for more information about this option.

After selecting your study spreadsheets and changing any additional
options, click the **Next >** button to select options about the first
study you have specified (*Meta-Analysis Window for the First Study* and *Meta-Analysis Window for the First Study with Alternate Tabs Showing*).

#### Dialog for the First Study¶

Four sets of options will show in this window:

- Where to obtain marker labels.
- Test allele and strand information (optional).
- Selection of fields containing data to be meta-analyzed.
- A checkbox for optional genomic control for this study.

Note

Golden Helix SVS Meta-Analysis requires the entry of a number of fields for each of the studies being analyzed–a possibly time-consuming process. To reduce the effort required in this step, Golden Helix SVS Meta-Analysis makes its best effort to deduce sensible field defaults for all study spreadsheet fields that need to be entered, for every study. These are based on possible field names that would result from association tests, as well as (for studies after the first study) the names of fields entered for the study previously entered.

#### Where to Obtain Marker Labels¶

This will usually default to the spreadsheet row labels. This is where
most SVS association test features put marker names. If
neither the row label column nor any other column has been set as a
default, you will be prompted to **Select marker name column [or other
result label column]** in this field.

#### Test Allele and Strand Information (Optional)¶

If your first study spreadsheet is marker-mapped, you may select test allele and/or strand information from either the marker map or the spreadsheet itself. If the spreadsheet is not marker mapped, you may still select any test allele and/or strand information that is in the spreadsheet itself.

If the spreadsheet is marker-mapped, two tabs will appear, one for selecting from marker map fields, the other for selecting from spreadsheet columns. In the marker map tab, it will be possible to select the following fields:

**Test allele marker map field****Other allele marker map field****Strand marker map field**

In the spreadsheet tab, it will be possible to select from the following fields:

**Test allele column****Other allele column****Strand column**

If the spreadsheet is not marker-mapped, the above three column selection fields will appear with no tabs showing.

#### Selection of Fields Containing Data To Be Meta-Analyzed¶

Two tabs will appear for selecting fields from the first study–one for each of the two approaches:

**Sample-Size-Based Inputs****Inverse-Variance-Based Inputs**

#### Selection of Sample-Size-Based Inputs¶

If you wish to analyze all of your studies using the sample-size-based
approach, check the **Use Sample-Size-Based Inputs** box within the
**Sample-Size-Based Inputs** tab. Then enter the p-value field
(**Select p-value column** if there is no default), the effect
direction field (**Select effect direction column** if there is no
default), and an indication of the number of samples. There are four
ways to indicate the number of samples:

**Actual number of samples by marker**: If your spreadsheet has this information in one of its columns, check this box and select the column using the column selector.**Actual number of samples overall**: Otherwise, check this box and enter the total number of samples in the blank.**Effective number of samples by marker, computed from:**For case/control studies, if you have columns for the number of cases and number of controls, respectively, check this box and select those columns using the column selectors.**Effective number of samples overall, computed from:**For case/control studies, if you do not have columns for the number of cases and number of controls, you may check this box and enter the number of cases and number of controls in their respective blanks.

#### Selection of Inverse-Variance-Based Inputs¶

If you wish to analyze all of your studies using the
inverse-variance-based approach, select the **Inverse-Variance-Based
Inputs** tab. Two checkboxes will show:

**Use Inverse-Variance-Based (Effect Size) Inputs**: Check this box to use the effect size approach, in which you can**Select effect size column**and**Select standard error column**.**Use Inverse-Variance-Based (Odds Ratio) Inputs**: Check this box to use the odds ratio/confidence interval approach, in which you can**Select odds ratio column**,**Select column for the CI lower bound**, and**Select column for the CI upper bound**.

Using effect size vs. using odds ratio can be mixed among studies which all use the inverse-variance approach.

#### Genomic Control for This Study¶

Check **Use Genomic Control for this study (Chi-Squared dist. with 1
degree of freedom assumed)** to use Genomic Control on the results
from this study before these results are meta-analyzed. (See
*Genomic Control*.)

#### Windows for Succeeding Studies¶

After selecting your options for the first study, click the **Next >**
button to select options for the next study you have specified. If you
are using the sample-size-based approach for your studies, the window
for that next study and all succeeding studies will look like
*Meta-Analysis Window for Succeeding Studies (Sample-Size-Based)*. For the inverse-variance-based approach, the
windows will look like *Meta-Analysis Window for Succeeding Studies (Inverse-Variance-Based)*. Except for the exclusion
of fields for the approach you are not using, all fields will be the
same as explained above, including clicking **Next >** to move to the
next study.

When you reach the options window for the last study you have
selected, there will be an **OK** button rather than a **Next >**
button. After selecting the options for the last study, click **OK**
to begin meta-analysis.

#### Outputs¶

When meta-analysis is finished, an output spreadsheet will be created,
containing the marker names as row labels. If you asked to **Propagate
marker map information from study spreadsheets**, the output
spreadsheet will have a marker map containing Chromosome and Position
fields. Otherwise, the output spreadsheet will not be marker mapped
and will be sorted by marker name.

The output spreadsheet will contain the following sets of columns:

- P-value summary:
*Overall P-Value*: The p-value resulting from the entire meta-analysis.*-log10(Overall P-Value)*: The negative log (base 10) of the overall p-value.*Expected Overall P-Value*: If you selected**Output expected P and -log 10(Expected P) for the final results**, this and the next output will appear, this output showing the result of ranking the p-values for all the markers and distributing the expected p-values for these uniformly.*-log10(Expected Overall P)*: The negative log (base 10) of the expected overall p-value.*Overall P-Value (GC)*: If you selected**Perform Genomic Control on the entire meta-analysis...**, this and the next one or three outputs will appear. This output shows the p-value resulting from performing genomic control over the final analysis results.*-log10(Overall P-Value) (GC)*: The negative log (base 10) of the overall genomically-controlled p-value.*Expected Overall P-Value (GC)*: The expected overall genomically-controlled p-value.*-log10(Expected Overall P) (GC)*: The negative log (base 10) of the expected overall genomically-controlled p-value.*RE Analysis P-Value*: The p-value obtained from random-effects analysis.*-log10(RE Analysis P-Value)*: The negative log (base 10) of the random-effects p-value.*Expected RE Analysis P-Value*: The expected random-effects analysis p-value.*-log10(Expected RE Analysis P)*: The negative log (base 10) of the expected random-effects analysis p-value.

- Outputs for fixed-effects analysis:
*Studies for which Valid*: The number of studies for which this marker has valid data.*Overall Effect Size*: The overall effect size . See*The Algorithm*.*Overall Standard Error*: The overall standard error . See*The Algorithm*.*Overall Odds Ratio*: The overall odds ratio , assuming that the overall effect size is equivalent to a term from a logistic regression. See*The Algorithm*.*Overall Lower Conf. Limit*: The overall lower confidence limit , assuming that the overall effect size is equivalent to a term from a logistic regression. See*The Algorithm*.*Overall Upper Conf. Limit*: The overall upper confidence limit , assuming that the overall effect size is equivalent to a term from a logistic regression. See*The Algorithm*.*Overall Z*: The overall Z-score . See*The Algorithm*.*Overall Chi-Squared*: The overall chi-square . See*The Algorithm*.*Overall Chi-Squared (GC)*: If you selected**Perform Genomic Control on the entire meta-analysis...**, this output will appear, containing the genomically-controlled overall chi-square value. See*Genomic Control*.

- Heterogeneity-related outputs (See
*Testing for Heterogeneity*):*Cochran’s Q**P-Value from Cochran’s Q**I Squared*: The fraction of variation across studies that is due to heterogeneity rather than chance.

- Random-effects-related outputs (See
*Random Effects Meta-Analysis*):*Tau Squared*: The random-effects variance component .*RE Analysis Effect Size*: Combined effect size .*RE Analysis Standard Error*: Standard error of the combined effect size.*RE Analysis Odds Ratio*: The combined odds ratio, assuming that the overall and combined effect sizes are equivalent to terms from logistic regressions.*RE Analysis Lower CL*: The combined lower confidence limit, assuming that the overall and combined effect sizes are equivalent to terms from logistic regressions.*RE Analysis Upper CL*: The combined upper confidence limit, assuming that the overall and combined effect sizes are equivalent to terms from logistic regressions.*RE Analysis Z*: Z-score of the combined effect size.*RE Analysis Chi Square*: Chi-square value of the combined effect size.

- Outputs for each individual study under the sample-size-based
approach. These will appear only if you have selected
**Additionally output statistics from individual studies**and you used the sample-size approach for meta-analysis. The outputs for each study are as follows, with the study number always indicated at the beginning of each header:*Study # __: SS Row Number*The row number of this study that contains information about this marker.*Study # __: Test A./Other A./Strand*If you have specified any strand-correction information for any of the studies, this field will appear for all of the studies, with ”?” (missing value) displayed wherever this information was not present or was not provided. The test allele (or ”?”), the other allele (or ”?”), and the strand (or ”?”) will all be shown, separated by slashes. If either this is the first study for which this marker is valid, or this study and a previous study provided enough information for strand correction and no correction was needed, this will be all that is shown. Otherwise, the word “(Reversed)” (or the word “(Inconsistent)”) will be shown to the left of the test allele/other allele/strand display.*Study # __: <P-Value Column Name>*: Echoes the selected p-value column.*Study # __: <Effect Direction Column Name> (Effect Dir.)*Echoes the selected effect direction column.*Study # __: Individual Z Value*Reports the Z-score value after strand correction.*Study # __: Sample Size*Shows if the overall sample size was entered.*Study # __: Effective Sample Size*Shows if the overall number of cases and overall number of controls were entered.*Study # __: <Number of Samples Column Name>*Shows if a number-of-samples column was selected.*Study # __: <Number of Cases Column Name>*Shows if columns were selected for number of cases and number of controls.*Study # __: <Number of Controls Column Name>*Shows if columns were selected for number of cases and number of controls.*Study # __: Effective Sample Size*Shows if columns were selected for number of cases and number of controls.*Study # __: Effect Size (from P-Value)*See*Reconciling with the Inverse Variance Approach*.*Study # __: Standard Error*See*Reconciling with the Inverse Variance Approach*.*Study # __: Individual Chi-Square (Pre GC)*If you selected**Use Genomic Control for this study...**, this and the next three columns will appear. The individual chi-square value (before genomic control) is computed as .*Study # __: Individual Chi-Square (GC)*The individual chi-squared value after genomic control has been applied. See*Genomic Control*.*Study # __: Weights Used (Pre GC)*This is . See*The Algorithm*.*Study # __: Weights Used (GC)*This is after genomic control has been applied. See*Genomic Control*.*Study # __: Weights Used*If you have NOT selected**Use Genomic Control for this study...**, this column, which is simply , will appear. See*The Algorithm*.*Study # __: RE Analysis Weights*The weights used for random-effects analysis.

- Outputs for each individual study under the inverse-variance-based
approach. These will appear only if you have selected
**Additionally output statistics from individual studies**and you used the inverse-variance-based approach for meta-analysis. The outputs for each study are as follows, with the study number always indicated at the beginning of each header:*Study # __: SS Row Number*The row number of this study that contains information about this marker.*Study # __: Test A./Other A./Strand*If you have specified any strand-correction information for any of the studies, this field will appear for all of the studies, with ”?” (missing value) displayed wherever this information was not present or was not provided. The test allele (or ”?”), the other allele (or ”?”), and the strand (or ”?”) will all be shown, separated by slashes. If either this is the first study for which this marker is valid, or this study and a previous study provided enough information for strand correction and no correction was needed, this will be all that is shown. Otherwise, the word “(Reversed)” (or the word “(Inconsistent)”) will be shown to the left of the test allele/other allele/strand display.*Study # __: <Effect Size column header> (Effect Size)*: Shows if you used**Effect Size**inputs. Echoes the effect size column. See*The Algorithm*.*Study # __: <Standard Error column header>*: Shows if you used**Effect Size**inputs. Echoes the standard error column. See*The Algorithm*.*Study # __: <Odds Ratio Column Header>*: Shows if you used**Odds Ratio**inputs. Echoes the odds ratio column.*Study # __: <Confidence Interval Lower Bound Column Header>*: Shows if you used**Odds Ratio**inputs. Echoes the confidence interval lower bound column.*Study # __: <Confidence Interval Upper Bound Column Header>*: Shows if you used**Odds Ratio**inputs. Echoes the confidence interval upper bound column.*Study # __: Effect Size*Shows if you used**Odds Ratio**inputs. The computed effect size. See*Using Odds-Ratio and Confidence-Interval Values*.*Study # __: Standard Error*Shows if you used**Odds Ratio**inputs. The computed standard error. See*Using Odds-Ratio and Confidence-Interval Values*.*Study # __: Individual Z Value*Reports the Z-score value after strand correction. See*The Algorithm*.*Study # __: Individual Chi-Square (Pre GC)*If you selected**Use Genomic Control for this study...**, this and the next three columns will appear. The individual chi-square value (before genomic control) is computed as .*Study # __: Individual Chi-Square (GC)*The individual chi-squared value after genomic control has been applied. See*Genomic Control*.*Study # __: Weights Used (Pre GC)*This is . See*The Algorithm*.*Study # __: Weights Used (GC)*This is after genomic control has been applied. See*Genomic Control*.*Study # __: Weights Used*If you have NOT selected**Use Genomic Control for this study...**, this column, which is simply , will appear. See*The Algorithm*.*Study # __: RE Analysis Weights*The weights used for random-effects analysis.

Note

A forest plot can be generated from the results. It is recommended that the results be sorted such that the most significant results are at the top.

To generate the plots, go to **Plot > Meta-Analysis Forest
Plot**.

See *Meta-Analysis Forest Plot* for more information.