survivalSL {survivalSL} | R Documentation |
Super Learner for Censored Outcomes
Description
This function allows to compute a Super Learner (SL) to predict survival outcomes.
Usage
survivalSL(formula, data, methods, metric="auc", penalty=NULL,
cv=10, param.tune=NULL, pro.time=NULL,
optim.local.min=FALSE, ROC.precision=seq(.01,.99,.01),
param.weights.fix=NULL, param.weights.init=NULL,
seed=NULL, optim.method="Nelder-Mead", maxit=1000,
show_progress=TRUE)
Arguments
formula |
A formula object, with the response on the left of a ~ operator, and the terms on the right. The response must be a survival object as returned by the |
data |
A data frame whose columns correspond to the variables present in the formula. |
methods |
A vector of characters with the names of the algorithms included in the SL. At least two algorithms have to be included. |
metric |
The loss function or metric used to estimate the weights of the algorithms in the SL. See details. |
penalty |
A numerical vector that allows the integration of covariates into the final model after selection (It concerns |
cv |
The number of splits for cross-validation. The default value is 10. |
param.tune |
A list with a length equals to the number of algorithms included in |
pro.time |
This optional value of prognostic time represents the maximum delay for which the capacity of the variable is evaluated. The same unit than the one used in the argument times. Not used for the following metrics: "ll", "ibs", and "ibll". Default value is the time at which half of the subjects are still at risk. |
optim.local.min |
An optional logical value. If |
ROC.precision |
The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. Only used when |
param.weights.fix |
A vector with the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in |
param.weights.init |
A vector with the initial values of the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in |
seed |
A random seed to ensure reproducibility. If |
optim.method |
The optimization method used to estimate the weights. It can be either |
maxit |
The number of iterations during the weight optimization process. By default, it is set to 1000. |
show_progress |
Parameter to display the progress bar. By default, it is set to |
Details
Each object of the list declared in param.tune
must have the same name than the names of the methods
included in the SL. If param.tune
= NULL
, survivalSL
has already predefined default grids of tunning parameters for each algorithm in this case. The final tunning parameters are chosen thanks to cv
-fold cross-validation (except for LIB_RSF
, which uses the Out Of Bag observations to select the best hyperparameters based on the optimal value of the chosen metric). The following metrics can be used : "bs" for the Brier score at the prognostic time pro.time
, "p_ci" and "uno_ci" for the concordance index at the prognostic time pro.time
(Pencina and Uno versions), "ll" for the log-likelihood, "ibs" for the integrated Brier score up to the last observed time in the training data, "ibll" for the integrated binomial log-likelihood up to the last observed time in the training data, "ribs" for the restricted integrated Brier score up to the prognostic time pro.time
, "ribll" for the restricted integrated binomial log-likelihood up to the prognostic time pro.time
, "bll" for the binomial log-likelihood, "auc" for the area under the time-dependent ROC curve up to the prognostic time pro.time
.
The following learners are available:
Names | Description | Package |
"LIB_AFTgamma" | Gamma-distributed AFT model | flexsurv |
"LIB_AFTggamma" | Generalized Gamma-distributed AFT model | flexsurv |
"LIB_AFTweibull" | Weibull-distributed AFT model | flexsurv |
"LIB_PHexponential" | Exponential-distributed PH model | flexsurv |
"LIB_PHgompertz" | Gompertz-distributed PH model | flexsurv |
"LIB_PHspline" | Spline-based PH model | flexsurv |
"LIB_COXall" | Usual Cox model | survival |
"LIB_COXaic" | Cox model with AIC-based forward selection | MASS |
"LIB_COXen" | Elastic Net Cox model | glmnet |
"LIB_COXlasso" | Lasso Cox model | glmnet |
"LIB_COXridge" | Ridge Cox model | glmnet |
"LIB_RSF" | Survival Random Forest | randomForestSRC |
"LIB_PLANN" | Survival Neural Network | survivalPLANN |
The following loss functions for the estimation of the super learner weigths are available (metric
):
Area under the ROC curve (
"auc"
)Pencina concordance index (
"p_ci"
)Uno concordance index (
"uno_ci"
)Brier score (
"bs"
)Binomial log-likelihood (
"bll"
)Integrated Brier score (
"ibs"
)Integrated binomial log-likelihood (
"ibll"
)Restricted integrated Brier score (
"ribs"
)Restricted integrated binomial log-Likelihood (
"ribll"
)Log-Likelihood (
"ll"
)
Value
times |
A vector of numeric values with the times of the |
predictions |
It corresponds to a matrix with the survival predictions related to the SL. |
FitALL |
It corresponds to a list of matrix with the survival predictions related to each of the learner used for the SL construction. |
formula |
The formula object used for the SL construction. |
data |
The data frame used for learning. |
ROC.precision |
The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. |
cv |
The number of splits for cross-validation. |
methods |
A vector of characters with the names of the algorithms included in the SL. |
pro.time |
The maximum delay for which the capacity of the variable is evaluated. |
models |
A list with the estimated models/algorithms included in the SL. |
weights |
A list composed by two vectors: the regressions |
metric |
A list composed by two vectors: the loss function used to estimate the weights of the algorithms in the SL and its cross validation value. |
param.tune |
The estimated tunning parameters. |
seed |
The random seed used. |
optim.method |
The optimization method used. |
References
Polley E and van der Laanet M. Super Learner In Prediction. http://biostats.bepress.com. 2010.
Examples
data("dataDIVAT2")
# The Super Learner based from the first 200 individuals of the data base
formula<-Surv(times,failures) ~ age + hla + retransplant + ecd
sl1 <- survivalSL(formula=formula, data=dataDIVAT2[1:200,],
methods=c("LIB_AFTgamma", "LIB_PHgompertz"))
# Individual prediction
pred <- predict(sl1, newdata=data.frame(age=c(52,52), hla=c(0,1),
retransplant=c(1,1), ecd=c(0,1)))
plot(y=pred$predictions$sl[1,], x=pred$times, xlab="Time (years)",
ylab="Predicted survival", col=1, type="l", lty=1, lwd=2, ylim=c(0,1))
lines(y=pred$predictions$sl[2,], x=pred$times, col=2, type="l", lty=1, lwd=2)
legend("topright", col=c(1,2), lty=1, lwd=2, c("Subject #1", "Subject #2"))