PenOpt {fusedTree} | R Documentation |
Tuning of the penalty parameters of fusedTree using cross-validation
Description
Tuning is conducted by optimizing the cross-validated likelihood.
Users can either include the fusion penalty (by specifying alphaInit > 0
),
or omit the fusion penalty (by specifying alphaInit = 0
). If
alphaInit = 0
, only the standard ridge penalty lambda
is tuned.
Note that Dat_Tree()
is called internally so please provide the
original data as input arguments.
Usage
PenOpt(
Tree,
X,
Y,
Z,
model = NULL,
lambdaInit = 10,
alphaInit = 10,
folds = CVfoldsTree(Y = Y, Tree = Tree, Z = Z, model = model),
loss = "loglik",
multistart = FALSE,
maxIter = 30,
LinVars = FALSE
)
Arguments
Tree |
The fitted tree. Currently this should be a tree fitted using
|
X |
The original omics data matrix. Has dimensions (sample size × number of omics variables). Should be a matrix. |
Y |
The response; should be either numeric, binary (encoded by 0 and 1),
or a survival object created by |
Z |
The original clinical data matrix, which was used to fit the tree.
Should be a |
model |
Character. Specifies the outcome model. One of |
lambdaInit |
Numeric. Initial value for the standard ridge (L2) penalty
|
alphaInit |
Numeric. Initial value for the fusion penalty |
folds |
List. Each element contains the indices of the test samples for a fold. It is advisable to balance the samples with respect to the outcome (for binary and survival models) and the tree structure. If not provided, folds are generated internally. |
loss |
Character. The loss function to optimize in cross-validation.
For binary and survival outcomes, only |
multistart |
Logical. Whether to initialize with different starting values when
optimizing the cross-validated likelihood. Can help with stability when both
|
maxIter |
Integer. Maximum number of iterations for the IRLS (iterative reweighted least squares) algorithm. Used only for logistic and Cox models. Defaults to 30. |
LinVars |
Logical. Whether to include continuous clinical variables
linearly in the model (in addition to the tree structure). Can be helpful
since trees may not capture linear effects well. Defaults to |
Details
The cross-validated likelihood is optimized using the Nelder-Mead
method from stats::optim()
. When tuning both lambda
and
alpha
, the objective function can be noisy. Setting
multistart = TRUE
performs optimization from several starting values
to improve robustness. This is only applicable when alphaInit > 0
.
Value
A numeric vector with the tuned values of the penalties:
-
lambda
: standard ridge (L2) penalty. -
alpha
: fusion penalty (only ifalphaInit > 0
).
If alphaInit = 0
, only the tuned lambda
is returned.
References
Examples
p = 5 # number of omics variables (low for illustration)
p_Clin = 5 # number of clinical variables
N = 100 # sample size
# simulate from Friedman-like function
g <- function(z) {
15 * sin(pi * z[,1] * z[,2]) + 10 * (z[,3] - 0.5)^2 + 2 * exp(z[,4]) + 2 * z[,5]
}
set.seed(11)
Z <- as.data.frame(matrix(runif(N * p_Clin), nrow = N))
X <- matrix(rnorm(N * p), nrow = N) # omics data
betas <- c(1,-1,3,4,2) # omics effects
Y <- g(Z) + X %*% betas + rnorm(N) # continuous outcome
Y <- as.vector(Y)
dat = cbind.data.frame(Y, Z) #set-up data correctly for rpart
rp <- rpart::rpart(Y ~ ., data = dat,
control = rpart::rpart.control(xval = 5, minbucket = 10),
model = TRUE)
cp = rp$cptable[,1][which.min(rp$cptable[,4])] # best model according to pruning
Treefit <- rpart::prune(rp, cp = cp)
plot(Treefit)
folds <- CVfoldsTree(Y = Y, Tree = Treefit, Z = Z, model = "linear")
optPenalties <- PenOpt(Tree = Treefit, X = X, Y = Y, Z = Z,
model = "linear", lambdaInit = 10, alphaInit = 10,
loss = "loglik",
LinVars = FALSE,
folds = folds, multistart = FALSE)
optPenalties