cv.TSLA {TSLA} | R Documentation |
Cross validation for TSLA
Description
Conduct cross validation to select the optimal tuning parameters in TSLA.
Usage
cv.TSLA(
y,
X_1 = NULL,
X_2,
treemat,
family = c("ls", "logit"),
penalty = c("CL2", "RFS-Sum"),
pred.loss = c("MSE", "AUC", "deviance"),
gamma.init = NULL,
weight = NULL,
nfolds = 5,
group.weight = NULL,
feature.weight = NULL,
control = list(),
modstr = list()
)
Arguments
y |
Response in matrix form, continuous for |
X_1 |
Design matrix for unpenalized features (excluding intercept). Need to be in the matrix form. |
X_2 |
Expanded design matrix for |
treemat |
Expanded tree structure in matrix form for
|
family |
Two options. Use "ls" for least square problems and "logit" for logistic regression problems. |
penalty |
Two options for group penalty on |
pred.loss |
Model performance metrics. If |
gamma.init |
Initial value for the optimization. Default is a zero vector.
The length should equal to 1+ |
weight |
A vector of length two and it is used for logistic regression only. The first element corresponds to weight of y=1 and the second element corresponds to weight of y=0. |
nfolds |
Number of cross validation folds. Default is 5. |
group.weight |
User-defined weights for group penalty. Need to be a vector and the length equals to the number of groups. |
feature.weight |
User-defined weights for each predictor after expansion. |
control |
A list of parameters controlling algorithm convergence. Default values:
|
modstr |
A list of parameters controlling tuning parameters. Default values:
|
Value
A list of cross validation results.
lambda.min |
|
alpha.min |
|
cvm |
A (number-of-lambda * number-of-alpha) matrix saving the means of cross validation loss across folds. |
cvsd |
A (number-of-lambda * number-of-alpha) matrix saving standard deviations of cross validation loss across folds. |
TSLA.fit |
Outputs from |
Intercept.min |
Intercept corresponding to |
cov.min |
Coefficients of unpenalized features
corresponding to |
beta.min |
Coefficients of binary features corresponding
to |
gamma.min |
Node coefficients corresponding to |
groupnorm.min |
Group norms of node coefficients corresponding to |
lambda.min.index |
Index of the best |
alpha.min.index |
Index of the best |
Examples
# Load the synthetic data
data(ClassificationExample)
tree.org <- ClassificationExample$tree.org # original tree structure
x2.org <- ClassificationExample$x.org # original design matrix
x1 <- ClassificationExample$x1
y <- ClassificationExample$y # response
# Do the tree-guided expansion
expand.data <- getetmat(tree.org, x2.org)
x2 <- expand.data$x.expand # expanded design matrix
tree.expand <- expand.data$tree.expand # expanded tree structure
# Do train-test split
idtrain <- 1:200
x1.train <- as.matrix(x1[idtrain, ])
x2.train <- x2[idtrain, ]
y.train <- y[idtrain, ]
x1.test <- as.matrix(x1[-idtrain, ])
x2.test <- x2[-idtrain, ]
y.test <- y[-idtrain, ]
# specify some model parameters
set.seed(100)
control <- list(maxit = 100, mu = 1e-3, tol = 1e-5, verbose = FALSE)
modstr <- list(nlambda = 5, alpha = seq(0, 1, length.out = 5))
simu.cv <- cv.TSLA(y = y.train, as.matrix(x1[idtrain, ]),
X_2 = x2.train,
treemat = tree.expand, family = 'logit',
penalty = 'CL2', pred.loss = 'AUC',
gamma.init = NULL, weight = c(1, 1), nfolds = 5,
group.weight = NULL, feature.weight = NULL,
control = control, modstr = modstr)
# Do prediction with the selected tuning parameters on the test set. Report AUC on the test set.
rmid <- simu.cv$TSLA.fit$rmid # remove all zero columns
if(length(rmid) > 0){
x2.test <- x2.test[, -rmid]}
y.new <- predict_cvTSLA(simu.cv, as.matrix(x1[-idtrain, ]), x2.test)
library(pROC)
auc(as.vector(y.test), as.vector(y.new))