SVEMnet {SVEMnet}R Documentation

Fit an SVEMnet Model

Description

Wrapper for 'glmnet' (Friedman et al. 2010) to fit an ensemble of Elastic Net models using the Self-Validated Ensemble Model method (SVEM, Lemkus et al. 2021). Allows searching over multiple alpha values in the Elastic Net penalty.

Usage

SVEMnet(
  formula,
  data,
  nBoot = 200,
  glmnet_alpha = c(0, 0.5, 1),
  weight_scheme = c("SVEM", "FWR", "Identity"),
  objective = c("wAIC", "wSSE"),
  standardize = TRUE,
  ...
)

Arguments

formula

A formula specifying the model to be fitted.

data

A data frame containing the variables in the model.

nBoot

Number of bootstrap iterations (default is 200).

glmnet_alpha

Elastic Net mixing parameter(s) (default is c(0, 0.5, 1)). Can be a vector of alpha values, where alpha = 1 corresponds to Lasso and alpha = 0 corresponds to Ridge regression.

weight_scheme

Weighting scheme for SVEM (default is "SVEM"). Valid options are "SVEM", "FWR", and "Identity". "FWR" calculates the Fractional Weight Regression (Xu et al., 2020) and is included for demonstration; "SVEM" generally provides better performance."Identity" simply sets the training and validation weights to 1. Use with nBoot = 1 and objective = "wAIC" to get an elastic net fit on the training data using AIC.

objective

Objective function for selecting lambda (default is "wAIC"). Valid options are "wAIC" and "wSSE". The "w" refers to "weighted" validation.

standardize

logical. Passed to glmnet to control standardization (default is TRUE).

...

Additional arguments passed to the underlying glmnet() function.

Details

The Self-Validated Ensemble Model (SVEM, Lemkus et al., 2021) framework provides a bootstrap approach to improve predictions from various base learning models, including Elastic Net regression as implemented in 'glmnet'. SVEM is particularly suited for situations where a complex response surface is modeled with relatively few experimental runs.

In each of the 'nBoot' iterations, SVEMnet applies random exponentially distributed weights to the observations. Anti-correlated weights are used for validation.

SVEMnet allows for the Elastic Net mixing parameter ('glmnet_alpha') to be a vector, enabling the function to search over multiple 'alpha' values within each bootstrap iteration. Within each iteration, the model is fit for each specified 'alpha', and the best 'alpha' is selected based on the specified 'objective'.

objective options:

"wSSE"

Weighted Sum of Squared Errors. Selects the lambda that minimizes the weighted validation error without penalizing model complexity. While this may lead to models that overfit when the number of parameters is large relative to the number of observations, SVEM mitigates overfitting (high prediction variance) by averaging over multiple bootstrap models. This is the objective function used by Lemkus et al. (2021) with weight_scheme="SVEM"

"wAIC"

Weighted Akaike Information Criterion. Balances model fit with complexity by penalizing the number of parameters. It is calculated as AIC = n \* log(wSSE / n) + 2 \* k, where wSSE is the weighted sum of squared errors, n is the number of observations, and k is the number of parameters with nonzero coefficients. Typically used with weight_scheme="FWR" or weight_scheme="Identity"

weight_scheme options:

"SVEM"

Uses anti-correlated fractional weights for training and validation sets, improving model generalization by effectively simulating multiple training-validation splits (Lemkus et al. (2021)). Published results (Lemkus et al. (2021), Karl (2024)) utilize objective="wSSE". However, unpublished simulation results suggest improved performance from using objective="wAIC" with weight_scheme="SVEM". See the SVEMnet Vignette for details.

"FWR"

Fractional Weight Regression as described by Xu et al. (2020). Weights are the same for both training and validation sets. This method does not provide the self-validation benefits of SVEM but is included for comparison. Used with objective="wAIC".

"Identity"

Uses weights of 1 for both training and validation. This uses the full dataset for both training and validation, effectively disabling the self-validation mechanism. Use with objective="wAIC" and nBoot=1 to fit the Elastic Net on the AIC of the training data.

A debiased fit is output (along with the standard fit). This is provided to allow the user to match the output of JMP, which returns a debiased fit whenever nBoot>=10. \ https://www.jmp.com/support/help/en/18.1/?utm_source=help&utm_medium=redirect#page/jmp/overview-of-selfvalidated-ensemble-models.shtml. The debiasing coefficients are always calculated by SVEMnet(), and the predict() function determines whether the raw or debiased predictions are returned via its debias argument. The default is debias=FALSE, based on performance on unpublished simulation results.

The output includes: **Model Output:** The returned object is a list of class svem_model, containing the following components:

Value

An object of class svem_model.

Acknowledgments

Development of this package was assisted by GPT o1-preview, which helped in constructing the structure of some of the code and the roxygen documentation. The code for the significance test is taken from the supplementary material of Karl (2024) (it was handwritten by that author).

References

Gotwalt, C., & Ramsey, P. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference. https://community.jmp.com/t5/Discovery-Summit-2018/Model-Validation-Strategies-for-Designed-Experiments-Using/ta-p/73730

Karl, A. T. (2024). A randomized permutation whole-model test heuristic for Self-Validated Ensemble Models (SVEM). Chemometrics and Intelligent Laboratory Systems, 249, 105122. doi:10.1016/j.chemolab.2024.105122

Karl, A., Wisnowski, J., & Rushing, H. (2022). JMP Pro 17 Remedies for Practical Struggles with Mixture Experiments. JMP Discovery Conference. doi:10.13140/RG.2.2.34598.40003/1

Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L. (2021). Self-Validated Ensemble Models for Design of Experiments. Chemometrics and Intelligent Laboratory Systems, 219, 104439. doi:10.1016/j.chemolab.2021.104439

Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap. The American Statistician, 74(4), 345–358. doi:10.1080/00031305.2020.1731599

Ramsey, P., Gaudard, M., & Levin, W. (2021). Accelerating Innovation with Space Filling Mixture Designs, Neural Networks and SVEM. JMP Discovery Conference. https://community.jmp.com/t5/Abstracts/Accelerating-Innovation-with-Space-Filling-Mixture-Designs/ev-p/756841

Ramsey, P., & Gotwalt, C. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Discovery-Summit-Europe-2018/Model-Validation-Strategies-for-Designed-Experiments-Using/ta-p/51286

Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C. (2021). SVEM: A Paradigm Shift in Design and Analysis of Experiments. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Abstracts/SVEM-A-Paradigm-Shift-in-Design-and-Analysis-of-Experiments-2021/ev-p/756634

Ramsey, P., & McNeill, P. (2023). CMC, SVEM, Neural Networks, DOE, and Complexity: It’s All About Prediction. JMP Discovery Conference.

Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22.

Examples

# Simulate data
set.seed(0)
n <- 21
X1 <- runif(n)
X2 <- runif(n)
X3 <- runif(n)
y <- 1 + 2*X1 + 3*X2 + X1*X2 + X1^2  + rnorm(n)
data <- data.frame(y, X1, X2, X3)

# Fit the SVEMnet model with a formula
model <- SVEMnet(
  y ~ (X1 + X2 + X3)^2 + I(X1^2) + I(X2^2) + I(X3^2),
  glmnet_alpha = c(1),
  data = data
)
coef(model)
plot(model)
predict(model,data)


[Package SVEMnet version 1.3.0 Index]