spar.cv {spareg} | R Documentation |
Sparse Projected Averaged Regression
Description
Apply Sparse Projected Averaged Regression to High-Dimensional Data, where the number of models and the threshold parameter is chosen using a cross-validation procedure.
Usage
spar.cv(
x,
y,
family = gaussian("identity"),
model = spar_glmnet(),
rp = NULL,
screencoef = NULL,
nfolds = 10,
nnu = 20,
nus = NULL,
nummods = c(20),
measure = c("deviance", "mse", "mae", "class", "1-auc"),
avg_type = c("link", "response"),
parallel = FALSE,
seed = NULL,
...
)
spareg.cv(
x,
y,
family = gaussian("identity"),
model = spar_glmnet(),
rp = NULL,
screencoef = NULL,
nfolds = 10,
nnu = 20,
nus = NULL,
nummods = c(20),
measure = c("deviance", "mse", "mae", "class", "1-auc"),
avg_type = c("link", "response"),
parallel = FALSE,
seed = NULL,
...
)
Arguments
x |
n x p numeric matrix of predictor variables. |
y |
quantitative response vector of length n. |
family |
a |
model |
function creating a |
rp |
function creating a |
screencoef |
function creating a |
nfolds |
number of folds to use for cross-validation; should be at least 2, defaults to 10. |
nnu |
number of different threshold values |
nus |
optional vector of |
nummods |
vector of numbers of marginal models to consider for
validation; defaults to |
measure |
loss to use for validation; defaults to |
avg_type |
type of averaging the marginal models; either on link (default) or on response level. This is used in computing the validation measure. |
parallel |
assuming a parallel backend is loaded and available, a logical indicating whether the function should use it in parallelizing the estimation of the marginal models. Defaults to FALSE. |
seed |
integer seed to be set at the beginning of the SPAR algorithm. Default to NULL, in which case no seed is set. |
... |
further arguments mainly to ensure back-compatibility |
Value
object of class 'spar.cv'
with elements
-
betas
p xmax(nummods)
sparse matrix of class'Matrix::dgCMatrix'
containing the standardized coefficients from each marginal model computed with the spar algorithm on the whole training data. -
intercepts
used in each marginal model, vector of lengthmax(nummods)
computed with the spar algorithm on the whole training data. -
scr_coef
p-vector of coefficients used for screening for standardized predictors -
inds
list of index-vectors corresponding to variables kept after screening in each marginal model of lengthmax(nummods)
-
RPMs
list of projection matrices used in each marginal model of lengthmax(nummods)
-
val_res
adata.frame
with CV results for each fold and for each element of nus and nummods -
nus
vector of\nu
's considered for thresholding -
nummods
vector of numbers of marginal models considered for validation -
family
a character corresponding to family object used for the marginal generalized linear model e.g.,"gaussian(identity)"
-
measure
character, type of validation measure used -
avg_type
character, averaging type for computing the validation measure -
rp
an object of class'randomprojection'
-
screencoef
an object of class'screeningcoef'
-
model
an object of class'sparmodel'
-
ycenter
empirical mean of initial response vector -
yscale
empirical standard deviation of initial response vector . -
xcenter
p-vector of empirical means of initial predictor variables -
xscale
p-vector of empirical standard deviations of initial predictor variables
See Also
spar, coef.spar.cv, predict.spar.cv, plot.spar.cv, print.spar.cv
Examples
example_data <- simulate_spareg_data(n = 200, p = 400, ntest = 100)
spar_res <- spar.cv(example_data$x, example_data$y, nfolds = 3L,
nummods = c(5, 10, 15, 20, 25, 30))
spar_res
coefs <- coef(spar_res)
pred <- predict(spar_res, example_data$x)
plot(spar_res)
plot(spar_res, plot_type = "val_measure", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_measure", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "val_numactive", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_numactive", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "res_vs_fitted", xfit = example_data$xtest,
yfit = example_data$ytest, opt_par = "1se")
plot(spar_res, "coefs", prange = c(1, 400))
spar_res <- spareg.cv(example_data$x, example_data$y,
nummods=c(5, 10, 15, 20, 25, 30))