spar {spareg} | R Documentation |
Sparse Projected Averaged Regression
Description
Apply Sparse Projected Averaged Regression to high-dimensional data by
building an ensemble of generalized linear models, where the high-dimensional
predictors can be screened using a screening coefficient and then projected
using data-agnostic or data-informed random projection matrices.
This function performs the procedure for a given grid of thresholds \nu
and a grid of the number of marginal models to be employed in the ensemble.
This function is also used in the cross-validated procedure spar.cv.
Usage
spar(
x,
y,
family = gaussian("identity"),
model = NULL,
rp = NULL,
screencoef = NULL,
xval = NULL,
yval = NULL,
nnu = 20,
nus = NULL,
nummods = c(20),
measure = c("deviance", "mse", "mae", "class", "1-auc"),
parallel = FALSE,
inds = NULL,
RPMs = NULL,
seed = NULL,
set.seed.iteration = FALSE,
...
)
spareg(
x,
y,
family = gaussian("identity"),
model = NULL,
rp = NULL,
screencoef = NULL,
xval = NULL,
yval = NULL,
nnu = 20,
nus = NULL,
nummods = c(20),
measure = c("deviance", "mse", "mae", "class", "1-auc"),
parallel = FALSE,
inds = NULL,
RPMs = NULL,
seed = NULL,
set.seed.iteration = FALSE,
...
)
Arguments
x |
n x p numeric matrix of predictor variables. |
y |
quantitative response vector of length n. |
family |
a family object used for the marginal generalized linear model,
default |
model |
function creating a |
rp |
function creating a |
screencoef |
function creating a |
xval |
optional matrix of predictor variables observations used for
validation of threshold nu and number of models; |
yval |
optional response observations used for validation of
threshold nu and number of models; |
nnu |
number of different threshold values |
nus |
optional vector of |
nummods |
vector of numbers of marginal models to consider for
validation; defaults to |
measure |
loss to use for validation; defaults to |
parallel |
assuming a parallel backend is loaded and available, a logical indicating whether the function should use it in parallelizing the estimation of the marginal models. Defaults to FALSE. |
inds |
optional list of index-vectors corresponding to variables kept
after screening in each marginal model of length |
RPMs |
optional list of projection matrices used in each
marginal model of length |
seed |
integer seed to be set at the beginning of the SPAR algorithm. Default to NULL, in which case no seed is set. |
set.seed.iteration |
a boolean indicating whether a different seed should be set in each marginal model |
... |
further arguments mainly to ensure back-compatibility |
Value
object of class 'spar'
with elements
-
betas
p xmax(nummods)
sparse matrix of class'Matrix::dgCMatrix'
containing the standardized coefficients from each marginal model -
intercepts
used in each marginal model -
scr_coef
vector of length p with coefficients used for screening the standardized predictors -
inds
list of index-vectors corresponding to variables kept after screening in each marginal model of length max(nummods) -
RPMs
list of projection matrices used in each marginal model of lengthmax(nummods)
-
val_res
data.frame
with validation results (validation measure and number of active variables) for each element ofnus
andnummods
-
val_set
logical flag, whether validation data were provided; ifFALSE
, training data were used for validation -
nus
vector of\nu
's considered for thresholding -
nummods
vector of numbers of marginal models considered for validation -
ycenter
empirical mean of initial response vector -
yscale
empirical standard deviation of initial response vector -
xcenter
p-vector of empirical means of initial predictor variables -
xscale
p-vector of empirical standard deviations of initial predictor variables -
rp
an object of class"randomprojection"
-
screencoef
an object of class"screeningcoef"
If a parallel backend is registered and parallel = TRUE
,
the foreach function
is used to estimate the marginal models in parallel.
References
Parzer R, Filzmoser P, Vana-Gür L (2024). “Sparse Data-Driven Random Projection in Regression for High-Dimensional Data.” Technical Report 2312.00130, arXiv.org E-Print Archive. doi:10.48550/arXiv.2312.00130.
Parzer R, Filzmoser P, Vana-Gür L (2024). “Data-Driven Random Projection and Screening for High-Dimensional Generalized Linear Models.” Technical Report 2410.00971, arXiv.org E-Print Archive. doi:10.48550/arXiv.2410.00971.
Clarkson KL, Woodruff DP (2013). “Low Rank Approximation and Regression in Input Sparsity Time.” In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC '13, 81–90. ISBN 9781450320290, doi:10.1145/2488608.2488620.
Achlioptas D (2003). “Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences, 66(4), 671-687. ISSN 0022-0000, doi:10.1016/S0022-0000(03)00025-4, Special Issue on PODS 2001.
See Also
spar.cv,coef.spar,predict.spar,plot.spar,print.spar
Examples
example_data <- simulate_spareg_data(n = 200, p = 2000, ntest = 100)
spar_res <- spar(example_data$x, example_data$y, xval = example_data$xtest,
yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))
coefs <- coef(spar_res)
pred <- predict(spar_res, xnew = example_data$x)
plot(spar_res)
plot(spar_res, plot_type = "Val_Meas", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "Val_Meas", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "Val_numAct", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "Val_numAct", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "res-vs-fitted", xfit = example_data$xtest,
yfit = example_data$ytest)
plot(spar_res, plot_type = "coefs", prange = c(1,400))
spar_res <- spareg(example_data$x, example_data$y, xval = example_data$xtest,
yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))