spar.cv {spareg}R Documentation

Sparse Projected Averaged Regression

Description

Apply Sparse Projected Averaged Regression to High-Dimensional Data, where the number of models and the threshold parameter is chosen using a cross-validation procedure.

Usage

spar.cv(
  x,
  y,
  family = gaussian("identity"),
  model = spar_glmnet(),
  rp = NULL,
  screencoef = NULL,
  nfolds = 10,
  nnu = 20,
  nus = NULL,
  nummods = c(20),
  measure = c("deviance", "mse", "mae", "class", "1-auc"),
  parallel = FALSE,
  seed = NULL,
  set.seed.iteration = FALSE,
  ...
)

spareg.cv(
  x,
  y,
  family = gaussian("identity"),
  model = spar_glmnet(),
  rp = NULL,
  screencoef = NULL,
  nfolds = 10,
  nnu = 20,
  nus = NULL,
  nummods = c(20),
  measure = c("deviance", "mse", "mae", "class", "1-auc"),
  parallel = FALSE,
  seed = NULL,
  set.seed.iteration = FALSE,
  ...
)

Arguments

x

n x p numeric matrix of predictor variables.

y

quantitative response vector of length n.

family

a 'family' object used for the marginal generalized linear model; defaults to gaussian("identity").

model

function creating a 'sparmodel' object; defaults to spar_glm() for gaussian family with identity link and to spar_glmnet() for all other family-link combinations.

rp

function creating a 'randomprojection' object.

screencoef

function creating a 'screeningcoef' object

nfolds

number of folds to use for cross-validation; should be at least 2, defaults to 10.

nnu

number of different threshold values \nu to consider for thresholding; ignored when nus is provided; defaults to 20.

nus

optional vector of \nu's to consider for thresholding; if not provided, nnu values ranging from 0 to the maximum absolute marginal coefficient are used.

nummods

vector of numbers of marginal models to consider for validation; defaults to c(20).

measure

loss to use for validation; defaults to "deviance" available for all families. Other options are "mse" or "mae" (between responses and predicted means, for all families), "class" (misclassification error) and "1-auc" (one minus area under the ROC curve) both just for binomial family.

parallel

assuming a parallel backend is loaded and available, a logical indicating whether the function should use it in parallelizing the estimation of the marginal models. Defaults to FALSE.

seed

integer seed to be set at the beginning of the SPAR algorithm. Default to NULL, in which case no seed is set.

set.seed.iteration

a boolean indicating whether a different seed should be set in each marginal model i. Defaults to FALSE. If TRUE, seed will be set to seed + i in each marginal model i.

...

further arguments mainly to ensure back-compatibility

Value

object of class 'spar.cv' with elements

See Also

spar,coef.spar.cv,predict.spar.cv,plot.spar.cv,print.spar.cv

Examples


example_data <- simulate_spareg_data(n = 200, p = 2000, ntest = 100)
spar_res <- spar.cv(example_data$x, example_data$y,
  nummods = c(5, 10, 15, 20, 25, 30))
spar_res
coefs <- coef(spar_res)
pred <- predict(spar_res, example_data$x)
plot(spar_res)
plot(spar_res, plot_type = "Val_Meas", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "Val_Meas", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "Val_numAct",  plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "Val_numAct",  plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "res-vs-fitted",  xfit = example_data$xtest,
  yfit = example_data$ytest, opt_par = "1se")
plot(spar_res, "coefs", prange = c(1, 400))


spar_res <- spareg.cv(example_data$x, example_data$y,
  nummods=c(5, 10, 15, 20, 25, 30))


[Package spareg version 1.0.0 Index]