2.22. Numeric Regression Analysis¶

Linear and logistic regression, stepwise linear and logistic regression, and permutation tests with numeric variables in a moving window along with numeric or categorical covariates, against one dependent variable can be performed from the Numeric Regression window.

Individual regressions may either be performed with all variables and covariates together in one regression (“full model only”) or as a pair of regressions, one with all variables and covariates together (the “full model”) and a second with only some of the covariates (the “reduced model”), to obtain a “full-vs-reduced-model p-value”. (See Full Versus Reduced Model Regression Equation.)

The covariates used for regression may optionally consist of interactions between other covariates that are derived directly from the spreadsheet.

For an overview of the theories behind regression analysis in Golden Helix SVS, see Linear Regression and Logistic Regression.

Note

The following discussion applies equally well, except where noted, to Genotypic Regression Analysis.

2.22.1. Full Versus Reduced Model Regression Equation¶

Sometimes it is desired to “correct for” binary, continuous, or categorical variables, otherwise known as “covariates”. These covariates, or first-order interactions between covariates, may be influencing the dependent variable response. Correcting for the covariates allows the user to see specifically what effects there are on the remaining variables.

To do this, first a regression equation which includes only the dependent and the reduced model covariates (plus a constant term) is calculated (the “reduced model”). Next, a regression which includes all of the variables including all full model covariates (along with all reduced model covariates and the constant term) is calculated (the “full model”). The significance of the full versus the reduced model is then calculated with an F-test (for linear regression) or a likelihood ratio statistic (for logistic regression).

See Full Versus Reduced Model Regression Equation (for linear regression) and Full Versus Reduced Model Regression Equation (for logistic regression) for more information.

2.22.2. Performing Analysis¶

To perform Numeric Regression Analysis, open a spreadsheet and select a column for the dependent variable. The dependent variable must be either quantitative (real- valued or integer-valued) or a binary case/control status column. To open the Regression window, select the Numeric > Regression Analysis menu item. This feature is currently supported for spreadsheets with only one column set as dependent. Categorical dependent columns are currently not supported.

The Regression Analysis window (see Figures Regression Analysis – Regression Parameters and Regression Analysis – Regression Parameters (When Covariate-Column Interactions Are Specified)) allows for various regression options to be set or changed. A list and brief description of these options is as follows:

• Regression Parameters: The first tab of the Regression Analysis window allows for the general regression parameters to be set. The parameters are:

• Selection Parameters: These options allow choosing how the covariates are themselves selected for the full and reduced model(s).

• Regress on each of the ### numeric columns tests each column individually.

• Regress on a moving window with parameters: tests windows of covariates.

• Perform single regression with selected covariates allows the user to explicitly select the covariates for one regression to be performed.

• Regress on covariate-column interactions (on ### numeric columns) takes each column and tests on interactions between selected covariates and that column.

Note

For Genotypic Regression, read “### genotypic columns” in the above descriptions and where these descriptions are used below.

• Regression Options:

• The user can specify if they wish to perform stepwise regression on the covariates which are only in the Full Model. Specification of a p-value cutoff and the method of backwards elimination or forward selection is required.

• The Output residual spreadsheet option is available when Perform single regression with selected covariates is selected.

• Full Model Covariates and Reduced Model Covariates. These column chooser boxes are activated based on the selection parameters described above.

• For Regress on each of the ### numeric columns and Regress on a moving window with parameters:, checking Add additional full model covariate(s) activates the Full Model Covariates box. The Full Model Covariates box is automatically active if the Perform single regression with selected covariates is selected.

• For any of these three selection parameters, checking Correct for covariate(s) activates the Reduced Model Covariates box.

• For Regress on covariate-column interactions, these boxes are renamed Covariate-Column Interactions and Additional Reduced Model Covariates. (See Regression Analysis – Regression Parameters (When Covariate-Column Interactions Are Specified).)

Covariate-Column Interactions is automatically active for this option, while checking Correct for additional covariate(s) activates the Additional Reduced Model Covariates box. (See Selecting Covariates for Covariate-Column Interactions.)

• Output Parameters: This tab (see Figure Regression Analysis – Output Parameters) allows multiple testing corrections to be set. Additional output, such as that used for the creation of P-P or Q-Q plots, and detailed output options are available in this tab.

Selection Parameters¶

The Regression Analysis allows the selection of the full model regressors or covariates. These options are detailed below:

• Regress on each of the ### numeric columns Each numeric column in the spreadsheet is tested individually as the only additional full model covariate, excluding the dependent column and any specified additional full model or reduced model covariates.

• Correct for covariate(s) Checking this box activates the Reduced Model Covariates box and allows the user to specify additional covariates to be included in the reduced model, thus correcting for any possible confounding effects caused by the covariates. See Full Versus Reduced Model Regression Equation for more information.

• Add additional full model covariates Checking this box activates the Full Model Covariates box and allows the user to specify additional covariates to be included in the full model. The resulting Full model p-value or Full vs. Reduced model p-value is associated with the effect of each column, plus the additional full model covariates specified.

• Use a moving window of regressors: There are two options for the definition of the moving window – either a moving window of a fixed number of columns or, if a marker map is applied, a dynamic moving window size with a fixed number of base pairs.

• Fixed window size: Specifies that a fixed number of numeric columns should be used for the moving window.

• Dynamic window size in base pairs: Specifies both the genetic distance in base pairs and size of the moving window. It will define which columns are considered to be within the window. The “Kb” field defines a maximum genetic distance in kilo-base pairs that the moving window will include, and the “max columns” field, if used, specifies the maximum number of columns within the maximum genetic distance to be included in the window. The window will not cross over chromosome boundaries as defined in the marker map. This option is only available for spreadsheets where a marker map has been applied.

• Correct for covariate(s) Checking this box activates the Reduced Model Covariates box and allows the user to specify additional covariates to be included in the reduced model, thus correcting for any possible confounding effects caused by the covariates. See Full Versus Reduced Model Regression Equation for more information.

• Add additional full model covariates Checking this box activates the Full Model Covariates box and allows the user to specify additional covariates to be included in the full model. The resulting Full model p-value or Full vs. Reduced model p-value is associated with the effect of each column, plus the additional full model covariates specified.

• Perform regression with selected covariates only Allows the user to select the numeric and/or categorical covariate columns to include in the full and reduced models or the first-order interactions between covariates in the regression. Selecting this option automatically activates the Full Model Covariates box–at least one covariate must be specified in this box.

• Regress on covariate-column interactions (on ### numeric columns) Takes each numeric column, excluding the dependent column and any specified covariates, and tests on the interactions between that column and numeric and/or categorical covariate columns which you select.

When you make this selection, the prompts will change as shown in Regression Analysis – Regression Parameters (When Covariate-Column Interactions Are Specified).

Use the Covariate-Column Interactions box, which is always active when you select Regress on covariate-column interactions, to specify the columns which will be used to form interaction terms with the numeric columns being interacted with.

• Checking Correct for additional covariate(s) activates the Additional Reduced Model Covariates box and allows you to specify additional covariates to be included in the reduced model, thus correcting for any possible confounding effects caused by the other covariates.

The tests in this mode are always done using a full vs. reduced model (see Full Versus Reduced Model Regression Equation). The interactions themselves are placed into the Full Model. The Reduced Model consists of:

• Any additional reduced-model covariates you have selected.

• The interacting covariates, taken just by themselves.

• The numeric column being interacted with, taken just by itself.

Regression Options¶

Whether the regression is linear or logistic, and whether it is stepwise, are indicated on the top right of the Regression Analysis window.

In the case of a binary dependent variable, the regressions will be logistic; for an integer or real-valued dependent column, the regressions will be linear.

The two regression options which are selectable here are:

• Stepwise Regression: Selecting this option specifies that each linear or logistic regression should be done as the specified stepwise regression procedure, either backwards elimination or forward selection. A p-value cut-off must be specified when running stepwise regression.

Backward elimination starts with all of the full-model-only covariates and removes the least significant covariate until removing any covariates would be more significant than the stepwise p-value cut-off specified. “Significant” here means testing using the current set of full-model-only covariates (plus any reduced-model covariates) as “the full model” and the current set of full-model-only covariates without the covariate to possibly be eliminated (plus any reduced-model covariates) as “the reduced model” to find a full-vs-reduced p-value.

Forward selection selects the most significant covariate and keeps adding the next most significant covariate until adding a further covariate is no longer significant. “Significant” here means testing the current set of full-model-only covariates plus the covariate to possibly be added (plus any reduced-model covariates) as “the full model” and the current set of full-model-only covariates (plus any reduced-model covariates) as “the reduced model” to find a full-vs-reduced p-value.

If the stepwise regression option is not selected, then for each column or window, or with the specified covariates, one single linear or logistic regression will take place.

Note

Stepwise regression is only useful if there are at least two full-model-only covariates.

• Output Residual spreadsheet If this option is checked, a residual spreadsheet will be created along with the results view from the regression. This spreadsheet will contain the actual, predicted, and residual values for each sample, as well as the spreadsheet values for the regressors.

Note

The Output Residual spreadsheet option is available only when you select Perform regression with selected covariates only (see Selection Parameters).

Full and Reduced Model Covariates¶

If you have selected Regress on each of the ### numeric columns, Use a moving window of regressors, or Perform single regression with selected covariates, select covariates and/or interaction terms as follows.

If either the Full Model Covariates or the Reduced Model Covariates box is active, additional covariates can be specified in the active section. To include a covariate in the analysis, click on the Add Covariate button. This will open a dialog allowing you to select the covariate(s) to use in the regression equation. Then, select the covariate(s) to include and click Add. If you would like to add all of the covariates in the list, click Add All. The selected covariates will be shown in either the Full Model Covariates or the Reduced Model Covariates list. To remove a covariate, select the covariate(s) to remove, and click Remove Selected. This will remove the item from the Full Model Covariates list and from the regression equation. To remove all covariates click Clear List.

To include first-order interactions between (fixed) pairs of covariates, click the Add Interaction button. This will open a dialog which displays two lists, each containing all of the covariate column names within the spreadsheet. Select the term(s) from each of the two lists which you would like to include and click Add. All selected items from the list on the left will be paired with all the selected items from the list on the right, and an item for each pair will be added to the Full Model Covariates or the Reduced Model Covariates list. If any of the selected items in either window represent categorical columns, then sub-items representing the dummy variables used in regression for each category will be paired with the items or sub-items from the other window. (Values from each pair are multiplied to create a “new” covariate, which is then used in the regression equation.)

When you have added all of the interactions, click Close to return to the regression window. All listed interactions will be included in the analysis, so unwanted interactions must be removed in order to exclude them. To remove an interaction, select the item(s) to remove and click Remove Selected.

Selecting Covariates for Covariate-Column Interactions¶

If you have selected Regress on covariate-column interactions (on ### numeric columns), select interaction terms as follows.

Note that in this mode, at least one interaction term must be specified.

To specify the columns which will be used to form (first-order) interaction terms with the numeric column being scanned, use the Covariate-Column Interactions box. To include one specific column which is to form interactions, click on the Add Col Interaction button. This will open a dialog allowing you to select covariate(s) to use as interaction terms. Then, select the covariate(s) to include and click Add. If you would like to add all of the covariates in the list, click Add All. The selected covariates will be shown in the Covariate-Column Interactions list. To remove a covariate, select the covariate(s) to remove, and click Remove Selected. This will remove the item(s) from the Covariate-Column Interactions list and from the regression equation. To remove all covariates (so you may start over again), click Clear List.

Each covariate is (1) used by itself as a reduced-model covariate, and (2) multiplied value-by-value with the column being scanned to create a “new” covariate, which is then used in the regression equation as a full-model-only covariate.

To include second-order interactions (between two covariates and the column being scanned), click the Add 3-Way Interaction button. This will open a dialog which displays two lists, each containing all of the covariate column names within the spreadsheet. Select the term(s) from each of the two lists which you would like to include and click Add. All selected items from the list on the left will be paired with all the selected items from the list on the right, and an item for each pair will be added to the Covariate-Column Interactions list. If any of the selected items in either window represent categorical columns, then sub-items representing the dummy variables used in regression for each category will be paired with the items or sub-items from the other window.

Values from each pair are multiplied with each other to create a “new” covariate, which is (1) used by itself as a reduced-model covariate and (2) multiplied with the column being scanned to create a “newer” covariate, which is then used in the regression equation as a full-model-only covariate.

When you have added all of the second-order interactions, click Close to return to the regression window. All listed interactions will be included in the analysis, so unwanted interactions must be removed in order to exclude them. To remove an interaction, select the item(s) to remove and click Remove Selected.

If the Additional Reduced Model Covariates box is active, additional reduced model covariates and/or reduced model first-order interaction terms may be added to this box in a manner similar to that specified in Full and Reduced Model Covariates.

Note on Missing Values¶

For Numeric Regression, all missing values will be dropped from the analysis, both from the predictor variables and from the dependent variable.

Note

For a discussion of missing values for Genotypic Regression Analysis, please see the Missing (Non-Covariate) Genotype Values section of Genotypic Parameters.

To enable the most utility from your regression results, some convenient derivative statistics can be computed on your p-values.

• Output -log10(P): Computes the value for each p-value and for each multiple- testing-corrected p-value.

• Output data for P-P/Q-Q plots: Computes expected value for each p-value and for each multiple-testing-corrected p-value. By plotting the expected vs. actual P values, you can create P-P or Q-Q plots. This option forces the -log10(P) output as well.

Note

Output data for P-P/Q-Q plots is only available if you have selected Regress once on each column, Use a moving window of regressors, or Regress on covariate-column interactions.

Multiple Testing Correction¶

It may be possible to obtain a good test statistic by chance alone. Multiple testing corrections are designed to help ensure, if possible, this is not the case. You may optionally select one or more of the following multiple testing corrections.

• Bonferroni adjustment (on N covariates) Multiplies p-values by , but does not allow the result to be more than 1. Here, is the number of successful regression tests that have been performed (or in other words, the number of columns or window positions that have been successfully tested).

• False Discovery Rate (FDR) A less severe adjustment. See False Discovery Rate for an explanation of this algorithm.

• Single value permutations and/or Full scan permutations Permutation testing of the linear and logistic regression models permutes the dependent variable, then runs the regressions over again, checking the significance of these regressions. This is distinct from checking the “fit” of the permuted dependent to the original regression results from a given set of regressors. The object is to see whether by chance, a different set of dependents could have had a better relationship or “fit” with the covariates and regressors. This is tested through performing a new regression for each permutation.

See Permutation Testing Methodology for a more detailed explanation and examples of permutation testing.

Note

If you have selected Perform regression with selected covariates only, the only available correction will be Single value permutations.

Viewing Detailed Results¶

If you have selected Perform regression with selected covariates only, detailed results (Regression Statistics Results Viewer) will always be shown for the regression.

Otherwise, If you have selected Regress on each of the ### numeric columns, Use a moving window of regressors, or Regress on covariate-column interactions (on ### numeric columns), the only outputs shown will be in rows of the Regression Results Spreadsheet, unless you specify criteria for which regressions you would like to see detailed results. To see detailed results for some regressions, check Output detailed results if… and set the desired criteria. There are three criteria:

• Value to use. These may be:

• P-Value Full vs. Reduced Model (available and default for full vs. reduced testing)

• -log 10 P-Value FullvsRed Model (available for full vs. reduced testing)

• P-Value Full Model (default for full-model-only testing)

• -log 10 P-Value Full Model

• R Squared Full Model (available for linear regression)

• Type of comparison. This may be “<” (default), “<=” (), “>”, or “>=” ().

• Threshold. (Defaults to 0.05.)

Detailed output will be generated for those regressions which match the criteria. All of these outputs will be placed into a single detailed-output viewer.

2.22.3. Running the Regression¶

Click Run to start the regression analysis procedure.

Note

Sometimes a regression may fail due to insufficient rank in the coefficient matrix. This can be a result of not enough observations or due to the inclusion of “collinear” regressors. A collinear regressor is one which is a linear combination of one or more other regressors.

2.22.4. Regression Outputs¶

There are three outputs which are possible from a Numeric Regression (although at most two may be output from any single regression). These are:

• A residual spreadsheet. This is output if you have selected Perform regression with selected covariates only and also selected to output a residual spreadsheet.

• A regression results spreadsheet. This is always output if you have selected Regress on each of the ### numeric columns, Use a moving window of regressors, or Regress on covariate-column interactions (on ### numeric columns).

• A regression statistics results viewer. This is always output if you have selected Perform regression with selected covariates only. Otherwise, it is output only if detailed output (Viewing Detailed Results) is specified and the criteria for detailed output are met.

If a residual spreadsheet is produced (see Figure Linear Regression Residual Spreadsheet), it will contain the actual, predicted and residual values of the dependent variable for each sample. The residual value of a sample is defined as the difference between the sample’s actual value and its predicted value from the regression. In addition, all full-model and reduced-model covariates and interaction terms originally specified for the regression are output.

Note

Strictly speaking, residuals do not make as much sense for logistic regression as they do for linear regression because the distribution of a logistic regression residual separates into two parts. However, this spreadsheet may still be used as a crude gauge of how well the regression model predicts the observed values of the dependent variable.

If you checked Regress on each of the ### numeric columns, Use a moving window of regressors, or Regress on covariate-column interactions (on ### numeric columns), a spreadsheet of regression results for each regression model calculated will be output (see Figure Linear Regression Results Spreadsheet). The rows of this spreadsheet correspond to unique regression models, the row label corresponds to the first regressor in a moving window, or, in the case of regressing once on each column or regressing on covariate-column interactions, the column used. The row label does not reflect any other covariates that are used.

Note

More detailed results (Regression Statistics Results Viewer) for any interesting regression models can either be found in the Regression Statistics Results Viewer, if the p-value or value meets the specified criteria (see Viewing Detailed Results), or by running a covariate only regression model, including all regressors and any full or reduced model covariates used in the “interesting” model.

The fields which are output to the regression results spreadsheet are as follows:

• P-Value results for the full vs. reduced model (if the full vs. reduced model was used)

• Full vs. reduced p-value

• -log base 10 of the full vs. reduced p-value (if Output -log10(P) or Output data for P-P/Q-Q plots was specified)

• Expected full vs. reduced p-value (if Output data for P-P/Q-Q plots was specified)

• -log base 10 of the expected full vs. reduced p-value (if Output data for P-P/Q-Q plots was specified)

• P-Value results for the full model

• Full model p-value

• -log base 10 of the full model p-value (if Output log10(P) or -Output data for P-P/Q-Q plots was specified)

• Expected full model p-value (if Output data for P-P/Q-Q plots was specified)

• -log base 10 of the expected full model p-value (if Output data for P-P/Q-Q plots was specified)

• R-squared (for linear regressions)

• Mean Y

• Specific outputs for regressing on each of the columns (when stepwise regression is NOT selected).

• Residual Standard Error (for linear regressions)

• Beta 0 (logistic regression) or Intercept (linear regression)

• Beta 0 Standard Error (logistic regression) or Intercept Standard Error (linear regression)

• Predictor Beta

• Predictor Beta SE

• Odds Ratio (for logistic regression only)

• OR Lower Confidence Bound (for logistic regression only)

• OR Upper Confidence Bound (for logistic regression only)

• Specific outputs for regressing on covariate-column interactions (when stepwise regression is NOT selected).

• The Beta and Beta Standard Error values for all of the full-model terms (both specified and implied) are output, as follows:

• For each specified covariate and interaction term (e.g. “TermA”),

• The beta coefficient for this term (called, e.g., TermA Beta) and

• The standard error associated with this coefficient (e.g. TermA Beta SE)

are output.

• Then, for the predictor,

• The beta coefficient (called Predictor Beta) and

• The standard error associated with it (called Predictor Beta SE)

are output.

• Finally, for each predictor interaction or three-way interaction,

• The beta coefficient for this term (called TermA*Predictor (Interaction) Beta) and

• The standard error associated with this coefficient (called TermA*Predictor (Interaction) Beta SE)

are output.

The log associated with the spreadsheet summarizes what actually constitutes the full model and the reduced model for this series of regressions.

• Regression degrees of freedom for the Full Model

• Regression degrees of freedom for the Reduced Model

• Residual degrees of freedom for the Full Model

• Specific outputs for regressing either on each of the columns or on covariate-column interactions (when stepwise regression is NOT selected).

• Value(s) of the main statistic:

• Chi-square Full Model (logistic full vs. reduced model)

• Chi-square Full vs. Reduced Model (logistic full vs. reduced model)

• F Full Model (linear full vs. reduced model)

• F Full vs. Reduced Model (linear full vs. reduced model)

• Chi-square (logistic full model only)

• F (linear full model only)

• The sample size

• Multiple testing correction. Depending on what you have selected, these outputs may include:

• Bonferroni P

• FDR

• Single-Value Permuted P

• -log10 Single-Value Permuted P

• expected Single-Value Permuted P-Value

• expected -log10 Single-Value Permuted P

• Full-Scan Permuted P

• -log10 Full-Scan Permuted P

• expected Full-Scan Permuted P-Value

• expected -log10 Full-Scan Permuted P

• Stepwise covariates actually used (when stepwise IS selected)

• Selected Regressors

Note

See Regression Results Spreadsheet for outputs provided by Genotypic Regression in addition to the above outputs.

Regression Statistics Results Viewer¶

A Regression Statistics Results Viewer (see Figure Linear Regression Statistics Results Viewer) will be displayed for a single regression or, on the other hand, if Output detailed results if… in the Output Parameters tab of the Regression Analysis window was selected, for all regressions that meet the criteria specified on that tab.

Linear Regression Statistics¶

The detailed output viewable in the Regression Statistics Viewer is detailed below.

Full Model Only Regression

If only a full model was used for the regression equation, the following model statistics are displayed for both normal and stepwise regression:

• Name of the response variable.

• Unsigned multiple correlation coefficient , where .

• Coefficient of determination .

• Adjusted . This statistic is meant to compensate for many regressors, each explaining small portions of the variation by chance alone.

• Sample size.

• Residual standard error .

• Unbiased standard deviation of the response.

• Value of the F-statistic.

• P-value of the F-statistic for the regression model.

• Single-value permuted p-value, if single-value permutation testing was selected.

• Full-scan permuted p-value, if full-scan permutation testing was selected.

• Number of permutations, if permutation testing was selected.

• Regression degrees of freedom.

• Residual degrees of freedom.

• Total degrees of freedom.

• Y-intercept.

• Intercept standard error.

Full Versus Reduced Model Regression

If a full versus reduced model was used for the regression equation, the following model statistics are displayed for both normal and stepwise regression:

• Name of the response variable.

• Coefficient of determination for the full model.

• Coefficient of determination for the reduced model.

• Adjusted for the full model. This statistic is meant to compensate for many regressors, each explaining small portions of the variation by chance alone.

• Sample size.

• Residual standard error .

• Unbiased standard deviation of the response.

• Value of the F-statistic for the full model.

• Value of the F-statistic for the full versus reduced model.

• P-value of the F-statistic for the full regression model.

• P-value of the F-statistic for the full versus reduced regression model.

• Single-value permuted p-value, if single-value permutation testing was selected.

• Full-scan permuted p-value, if full-scan permutation testing was selected.

• Number of permutations, if permutation testing was selected.

• Regression degrees of freedom of the full model.

• Regression degrees of freedom of the reduced model.

• Residual degrees of freedom of the full model.

• Total degrees of freedom of the full model.

• Y-intercept of the full model.

• Full-model intercept standard error.

• Y-intercept of the reduced model.

Logistic Regression Model Statistics¶

Full Model Only Regression

If only a full model was used for the regression equation, the following model statistics are displayed for both normal and stepwise regression:

• Name of the response variable.

• Regression likelihood .

• Null model likelihood .

• Sample size.

• Value of the Chi-Squared () statistic.

• P-value of the Chi-Squared statistic for the regression model.

• Single-value permuted p-value, if single-value permutation testing was selected.

• Full-scan permuted p-value, if full-scan permutation testing was selected.

• Number of permutations, if permutation testing was selected.

• Regression degrees of freedom.

• Residual degrees of freedom.

• Total degrees of freedom.

• .

• standard error.

Full Versus Reduced Model Regression

If a full versus reduced model was used for the regression equation, the following model statistics are displayed for both normal and stepwise regression:

• Name of the response variable.

• Full model likelihood .

• Reduced model likelihood .

• Chi-squared () statistic of the full model.

• Chi-squared statistic of the full versus reduced model.

• P-value of the Chi-Squared statistic for the full regression model.

• P-value of the Chi-Squared statistic for the full versus reduced regression model.

• Single-value permuted p-value, if single-value permutation testing was selected.

• Full-scan permuted p-value, if full-scan permutation testing was selected.

• Number of permutations, if permutation testing was selected.

• Regression degrees of freedom of the full model.

• Regression degrees of freedom of the reduced model.

• Residual degrees of freedom of the full model.

• Total degrees of freedom of the full model.

• for the full model.

• Standard error for for the full model.

• for the reduced model.

Linear Model Regressor Statistics¶

For both full-model-only and full versus reduced linear regressions, the Y-intercept for the full model is displayed. Also, for full versus reduced linear regression models, the Y-intercept for both the full and reduced models is displayed.

The following statistics are displayed for each regressor:

• Name

• Coefficient

• Standard error

• T-statistic for adding this regressor

• P-value for adding this regressor

• Univariate fit p-value

Logistic Model Regressor Statistics¶

For both full-model-only and full versus reduced logistic regressions, for the full model is displayed. Also, for full versus reduced logistic regression models, for both the full and reduced models is displayed.

The following statistics are displayed for each regressor:

• Name

• Coefficient

• Standard error

• P-value for adding this regressor

• Odds ratio

• Univariate fit p-value

The regression odds ratio for the coefficient is . The interpretation of this odds ratio is the ratio of the odds of the dependent being one (“true”) if the given regressor were increased by one unit to the odds of the dependent being one (“true”) when the given regressor has its current value.

Left Out Regressors¶

This list will include all regressors excluded from the final model of a stepwise regression model.

Moving Window Regressors¶

For Use a moving window of regressors, this will list the regressors used from this moving window position.

Column Regressor¶

For Regress once on each column, this will list the column position used for this regression.

Predictor¶

For Regress on covariate-column interactions (on ### numeric columns), this will list the predictor used for creating covariate-column interactions for this regression.

2.22.5. Caveats for Logistic Regression¶

Under some circumstances, the iteration procedure for the logistic regression algorithm will be unstable, and the regression may fail, even when the coefficient matrix has sufficient rank and significant regressors are included. Such a circumstance can arise when the regression algorithm tries to emulate a step function or otherwise tries to accommodate independent values for which the dependent variable is either exclusively 1 or exclusively 0.

If a stepwise regression model approach is used, similar circumstances resulting in instability may cause “paradoxical” phenomena such as:

• The final regression used to get the model statistics fails, even though it is “the same as” the last model tried in the stepwise regression algorithm. Actually, it is possible that a different order will be used for the regressors in the final model compared to the last model tried for stepwise regression. If the problem is highly unstable, the different order may be enough to cause failure.

• For some regressors, the p-value Pr(Chi) associated with dropping the regressor from the regression equaling 1 (). This happens when the regression fails after removing the regressor. This is only possible for a regressor other than the last one added to the model.

The best workaround is to filter out the data causing such instabilities. If one covariate of a regression has a coefficient above 15 or 20 or below -15 or -20 and the regressors from a stepwise regression won’t regress directly, or if a certain covariate does not regress by itself, the data should be filtered. Consider making a row subset spreadsheet based on ranges of values of the covariates and performing the desired regression model on each. Alternatively, consider stepwise regression if not already applied to the model. If stepwise regression is failing, changing the method from forward selection to backwards elimination or vice versa could result in a solution.