Finally, by plotting Q-Q plots of the observed –log10 p-values versus the “expected” -log10 rank p-values you can confirm an ideal solution.
This will generate a Q-Q Plot of the observed vs. expected values. See Figure 5a.
By examining the Q-Q plots after correcting for 31 and 38 components respectively, it is clear that 31 components yields a better result as the observed p-values more closely follow the y=x line. See Figure 5b, Figure 5c and Figure 5d. In studies without T-cell artifacts, the answer given by the slope and the F statistic will most likely be closer together. It is recommended that you examine the Q-Q plots to verify that the optimal number of principal components selected yields a good solution.
Thus, 31 components should be used for correcting this dataset. The first 31 principal components can now be applied to all markers including markers from non-autosomal chromosomes as long as Center data by marker is used.