PenOpt {fusedTree}R Documentation

Tuning of the penalty parameters of fusedTree using cross-validation

Description

Tuning is conducted by optimizing the cross-validated likelihood. Users can either include the fusion penalty (by specifying alphaInit > 0), or omit the fusion penalty (by specifying alphaInit = 0). If alphaInit = 0, only the standard ridge penalty lambda is tuned. Note that Dat_Tree() is called internally so please provide the original data as input arguments.

Usage

PenOpt(
  Tree,
  X,
  Y,
  Z,
  model = NULL,
  lambdaInit = 10,
  alphaInit = 10,
  folds = CVfoldsTree(Y = Y, Tree = Tree, Z = Z, model = model),
  loss = "loglik",
  multistart = FALSE,
  maxIter = 30,
  LinVars = FALSE
)

Arguments

Tree

The fitted tree. Currently this should be a tree fitted using rpart. Trees fitted by other R packages (e.g., partykit) may be allowed in the future.

X

The original omics data matrix. Has dimensions (sample size × number of omics variables). Should be a matrix.

Y

The response; should be either numeric, binary (encoded by 0 and 1), or a survival object created by Surv() from the survival package. Only right-censored survival data is allowed.

Z

The original clinical data matrix, which was used to fit the tree. Should be a data.frame.

model

Character. Specifies the outcome model. One of "linear", "logistic", or "cox".

lambdaInit

Numeric. Initial value for the standard ridge (L2) penalty lambda. Must be greater than zero. Defaults to 10.

alphaInit

Numeric. Initial value for the fusion penalty alpha. If set to 0, fusion is omitted and only lambda is tuned. Must be zero or greater. Defaults to 10.

folds

List. Each element contains the indices of the test samples for a fold. It is advisable to balance the samples with respect to the outcome (for binary and survival models) and the tree structure. If not provided, folds are generated internally.

loss

Character. The loss function to optimize in cross-validation. For binary and survival outcomes, only "loglik" (cross-validated likelihood) is supported. For continuous outcomes, an alternative is "sos" (sum of squares loss). Defaults to "loglik".

multistart

Logical. Whether to initialize with different starting values when optimizing the cross-validated likelihood. Can help with stability when both lambda and alpha are tuned, at the cost of longer runtime. Defaults to FALSE.

maxIter

Integer. Maximum number of iterations for the IRLS (iterative reweighted least squares) algorithm. Used only for logistic and Cox models. Defaults to 30.

LinVars

Logical. Whether to include continuous clinical variables linearly in the model (in addition to the tree structure). Can be helpful since trees may not capture linear effects well. Defaults to TRUE.

Details

The cross-validated likelihood is optimized using the Nelder-Mead method from stats::optim(). When tuning both lambda and alpha, the objective function can be noisy. Setting multistart = TRUE performs optimization from several starting values to improve robustness. This is only applicable when alphaInit > 0.

Value

A numeric vector with the tuned values of the penalties:

If alphaInit = 0, only the tuned lambda is returned.

References

porridge

Examples

p = 5 # number of omics variables (low for illustration)
p_Clin = 5 # number of clinical variables
N = 100 # sample size
# simulate from Friedman-like function
g <- function(z) {
  15 * sin(pi * z[,1] * z[,2]) + 10 * (z[,3] - 0.5)^2 + 2 * exp(z[,4]) + 2 * z[,5]
}
set.seed(11)
Z <- as.data.frame(matrix(runif(N * p_Clin), nrow = N))
X <- matrix(rnorm(N * p), nrow = N)            # omics data
betas <- c(1,-1,3,4,2)                         # omics effects
Y <- g(Z) + X %*% betas + rnorm(N)             # continuous outcome
Y <- as.vector(Y)
dat = cbind.data.frame(Y, Z) #set-up data correctly for rpart
rp <- rpart::rpart(Y ~ ., data = dat,
                   control = rpart::rpart.control(xval = 5, minbucket = 10),
                   model = TRUE)
cp = rp$cptable[,1][which.min(rp$cptable[,4])] # best model according to pruning
Treefit <- rpart::prune(rp, cp = cp)
plot(Treefit)
folds <- CVfoldsTree(Y = Y, Tree = Treefit, Z = Z, model = "linear")
optPenalties <- PenOpt(Tree = Treefit, X = X, Y = Y, Z = Z,
                       model = "linear", lambdaInit = 10, alphaInit = 10,
                       loss = "loglik",
                       LinVars = FALSE,
                       folds = folds, multistart = FALSE)
optPenalties

[Package fusedTree version 1.0.1 Index]