survivalSL {survivalSL}R Documentation

Super Learner for Censored Outcomes

Description

This function allows to compute a Super Learner (SL) to predict survival outcomes.

Usage

survivalSL(formula, data, methods, metric="auc", penalty=NULL,
cv=10, param.tune=NULL, pro.time=NULL,
optim.local.min=FALSE, ROC.precision=seq(.01,.99,.01),
param.weights.fix=NULL, param.weights.init=NULL,
seed=NULL, optim.method="Nelder-Mead", maxit=1000,
show_progress=TRUE)

Arguments

formula

A formula object, with the response on the left of a ~ operator, and the terms on the right. The response must be a survival object as returned by the Surv function.

data

A data frame whose columns correspond to the variables present in the formula.

methods

A vector of characters with the names of the algorithms included in the SL. At least two algorithms have to be included.

metric

The loss function or metric used to estimate the weights of the algorithms in the SL. See details.

penalty

A numerical vector that allows the integration of covariates into the final model after selection (It concerns "LIB_COXaic".) or/and allows the covariates not to be penalized (It concerns : "LIB_COXen" "LIB_COXlasso" and "LIB_COXridge".). We give the value 0 if we want to force the covariate in the model or/and not to be penalized otherwise 1. If NULL, all covariates undergo the selection process or/and penalization process.

cv

The number of splits for cross-validation. The default value is 10.

param.tune

A list with a length equals to the number of algorithms included in methods. If NULL, the tunning parameters are estimated (see details).

pro.time

This optional value of prognostic time represents the maximum delay for which the capacity of the variable is evaluated. The same unit than the one used in the argument times. Not used for the following metrics: "ll", "ibs", and "ibll". Default value is the time at which half of the subjects are still at risk.

optim.local.min

An optional logical value. If TRUE, the optimization is performed twice to better ensure the estimation of the weights. If FALSE (default value), the optimization is performed once.

ROC.precision

The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. Only used when metric="auc". 0 (min) and 1 (max) are not allowed. By default: seq(.01,.99,.01).

param.weights.fix

A vector with the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in methods. When completed, the related parameters are not estimated. The default value is NULL: the parameters are estimated by a cv-fold cross-validation. See details.

param.weights.init

A vector with the initial values of the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in methods. The default value is NULL: the initial values are equaled to 0. See details.

seed

A random seed to ensure reproducibility. If NULL, a seed is randomly assigned.

optim.method

The optimization method used to estimate the weights. It can be either "SANN" or "Nelder-Mead". By default we use Nelder-Mead.

maxit

The number of iterations during the weight optimization process. By default, it is set to 1000.

show_progress

Parameter to display the progress bar. By default, it is set to TRUE.

Details

Each object of the list declared in param.tune must have the same name than the names of the methods included in the SL. If param.tune = NULL, survivalSL has already predefined default grids of tunning parameters for each algorithm in this case. The final tunning parameters are chosen thanks to cv-fold cross-validation (except for LIB_RSF, which uses the Out Of Bag observations to select the best hyperparameters based on the optimal value of the chosen metric). The following metrics can be used : "bs" for the Brier score at the prognostic time pro.time, "p_ci" and "uno_ci" for the concordance index at the prognostic time pro.time (Pencina and Uno versions), "ll" for the log-likelihood, "ibs" for the integrated Brier score up to the last observed time in the training data, "ibll" for the integrated binomial log-likelihood up to the last observed time in the training data, "ribs" for the restricted integrated Brier score up to the prognostic time pro.time, "ribll" for the restricted integrated binomial log-likelihood up to the prognostic time pro.time, "bll" for the binomial log-likelihood, "auc" for the area under the time-dependent ROC curve up to the prognostic time pro.time.

The following learners are available:

Names Description Package
"LIB_AFTgamma" Gamma-distributed AFT model flexsurv
"LIB_AFTggamma" Generalized Gamma-distributed AFT model flexsurv
"LIB_AFTweibull" Weibull-distributed AFT model flexsurv
"LIB_PHexponential" Exponential-distributed PH model flexsurv
"LIB_PHgompertz" Gompertz-distributed PH model flexsurv
"LIB_PHspline" Spline-based PH model flexsurv
"LIB_COXall" Usual Cox model survival
"LIB_COXaic" Cox model with AIC-based forward selection MASS
"LIB_COXen" Elastic Net Cox model glmnet
"LIB_COXlasso" Lasso Cox model glmnet
"LIB_COXridge" Ridge Cox model glmnet
"LIB_RSF" Survival Random Forest randomForestSRC
"LIB_PLANN" Survival Neural Network survivalPLANN

The following loss functions for the estimation of the super learner weigths are available (metric):

Value

times

A vector of numeric values with the times of the predictions.

predictions

It corresponds to a matrix with the survival predictions related to the SL.

FitALL

It corresponds to a list of matrix with the survival predictions related to each of the learner used for the SL construction.

formula

The formula object used for the SL construction.

data

The data frame used for learning.

ROC.precision

The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve.

cv

The number of splits for cross-validation.

methods

A vector of characters with the names of the algorithms included in the SL.

pro.time

The maximum delay for which the capacity of the variable is evaluated.

models

A list with the estimated models/algorithms included in the SL.

weights

A list composed by two vectors: the regressions coefficients of the logistic multinomial regression and the resulting weights' values.

metric

A list composed by two vectors: the loss function used to estimate the weights of the algorithms in the SL and its cross validation value.

param.tune

The estimated tunning parameters.

seed

The random seed used.

optim.method

The optimization method used.

References

Polley E and van der Laanet M. Super Learner In Prediction. http://biostats.bepress.com. 2010.

Examples

data("dataDIVAT2")

# The Super Learner based from the first 200 individuals of the data base

formula<-Surv(times,failures) ~ age + hla + retransplant + ecd

sl1 <- survivalSL(formula=formula, data=dataDIVAT2[1:200,],
                  methods=c("LIB_AFTgamma", "LIB_PHgompertz"))

# Individual prediction
pred <- predict(sl1, newdata=data.frame(age=c(52,52), hla=c(0,1),
retransplant=c(1,1), ecd=c(0,1)))

plot(y=pred$predictions$sl[1,], x=pred$times, xlab="Time (years)",
ylab="Predicted survival", col=1, type="l", lty=1, lwd=2, ylim=c(0,1))

lines(y=pred$predictions$sl[2,], x=pred$times, col=2, type="l", lty=1, lwd=2)

legend("topright", col=c(1,2), lty=1, lwd=2, c("Subject #1", "Subject #2"))

[Package survivalSL version 0.98 Index]