regression_opt {mantar}R Documentation

Stepwise Multiple Regression Search based on Information Criteria

Description

Stepwise Multiple Regression Search based on Information Criteria

Usage

regression_opt(
  data = NULL,
  n = NULL,
  mat = NULL,
  dep_ind,
  n_calc = "individual",
  missing_handling = "stacked-mi",
  k = "log(n)",
  nimp = 20
)

Arguments

data

Raw data containing only the variables to be tested within the multiple regression as dependent or independent variable. May include missing values.

n

Numeric value specifying the sample size used in calculating information criteria for model search. If not provided, it will be computed based on the data. If a correlation matrix (mat) is supplied instead of raw data, n must be provided.

mat

Optional covariance or correlation matrix for the variables to be used within the multiple regression. #' Used only if data is NULL.

dep_ind

Index of the column within a data set to be used as dependent variable within in the regression model.

n_calc

Method for calculating the sample size for node-wise regression models. Can be one of: "individual" (sample size for each variable is the number of non-missing observations for that variable), "average" (sample size is the average number of non-missing observations across all variables), "max" (sample size is the maximum number of non-missing observations across all variables), "total" (sample size is the total number of observations across in the data set / number of rows).

missing_handling

Method for estimating the correlation matrix in the presence of missing data. "tow-step-em" uses a classic EM algorithm to estimate the covariance matrix from the data. "stacked-mi" uses multiple imputation to estimate the covariance matrix from the data. "pairwise" uses pairwise deletion to estimate the covariance matrix from the data. "listwise" uses listwise deletion to estimate the covariance matrix from the data.

k

Penalty per parameter (number of predictors + 1) to be used in node-wise regressions; the default log(n) (number of observations observation) is the classical BIC. Alternatively, classical AIC would be k = 2.

nimp

Number of multiple imputations to perform when using multiple imputation for missing data (default: 20).

Value

A list with the following elements:

regression

Named vector of regression coefficients for the dependent variable.

R2

R-squared value of the regression model.

n

Sample size used in the regression model.

args

List of arguments used in the regression model, including k, missing_handling, and nimp.

Examples

# For full data using AIC
# First variable of the data set as dependent variable
result <- regression_opt(
  data = mantar_dummy_full,
  dep_ind = 1,
  k = "2"
)

# View regression coefficients and R-squared
result$regression
result$R2

# For data with missingess using BIC
# Second variable of the data set as dependent variable
# Using individual sample size of the dependent variable and stacked Multiple Imputation

result_mis <- regression_opt(
 data = mantar_dummy_mis,
 dep_ind = 2,
 n_calc = "individual",
 missing_handling = "two-step-em",
 )

 # View regression coefficients and R-squared
 result_mis$regression
 result_mis$R2

[Package mantar version 0.1.0 Index]