5. Verify Solution with Q-Q Plots

Finally, by plotting Q-Q plots of the observed –log10 p-values versus the “expected” -log10 rank p-values you can confirm an ideal solution.

A. Generate Q-Q Plot

  • Open the Association Tests results spreadsheet that was generated using the optimum number of principal components as determined in the previous step. In this example it would be the Association Tests –PCA on predictors (Center by Marker) PCA 31.
  • Activate all rows by selecting Select >Row >Activate All Rows.
  • Next, generate the plot by going to Plot > XY Scatter Plots.
  • For the independent axis (in the left box) select Corr/Trend expected –log10 P, and for the dependent axis (in the right box) select Corr/Trend –log10 P.
  • Click Plot.

This will generate a Q-Q Plot of the observed vs. expected values. See Figure 5a.

QQ Plot

Figure 5a. Plot Viewer – Q-Q plot of observed versus expected -log 10 P values.

  • When the plot viewer opens, click on Graph1 in the Graph Control Interface on the left. The Graph control panel will become visible.
  • Click on the Add Item tab and select f(x) = m(x) + b from the list, then click Add.

By examining the Q-Q plots after correcting for 31 and 38 components respectively, it is clear that 31 components yields a better result as the observed p-values more closely follow the y=x line. See Figure 5b, Figure 5c and Figure 5d. In studies without T-cell artifacts, the answer given by the slope and the F statistic will most likely be closer together. It is recommended that you examine the Q-Q plots to verify that the optimal number of principal components selected yields a good solution.

Thus, 31 components should be used for correcting this dataset. The first 31 principal components can now be applied to all markers including markers from non-autosomal chromosomes as long as Center data by marker is used.

Good correction

Figure 5b. Good correction, the p-values are on the line between 0 and 2 along the X axis.

Over correction

Figure 5c.Over correction, the p-values are below the line between 0 and approximately 3 along the X axis.

Under correction

Figure 5d. Under correction, the p-values are above the line between 0 and 3 along the X axis.

Table Of Contents

Previous topic

4. Interpret Results