survcompare {survcompare} | R Documentation |
Cross-validates and compares Cox Proportionate Hazards and Survival Random Forest models
Description
The function performs a repeated nested cross-validation for
Cox-PH (survival package, survival::coxph) or Cox-Lasso (glmnet package, glmnet::cox.fit)
Survival Random Forest (randomForestSRC::rfsrc), or its ensemble with the Cox model (if use_ensemble =TRUE)
The same random seed for the train/test splits are used for all models to aid fair comparison; and the performance metrics are computed for the tree models including Harrel's c-index, time-dependent AUC-ROC, time-dependent Brier Score, and calibration slope. The statistical significance of the performance differences between Cox-PH and Cox-SRF Ensemble is tested and reported.
The function is designed to help with the model selection by quantifying the loss of predictive performance (if any) if Cox-PH is used instead of a more complex model such as SRF which can capture non-linear and interaction terms, as well as non-proportionate hazards. The difference in performance of the Ensembled Cox and SRF and the baseline Cox-PH can be viewed as quantification of the non-linear and cross-terms contribution to the predictive power of the supplied predictors.
The function is a wrapper for survcompare2(), for comparison of the CoxPH and SRF models, and an alternative way to do the same analysis is to run survcox_cv() and survsrf_cv(), then using survcompare2()
Cross-validates and compares Cox Proportionate Hazards and Survival Random Forest models
Usage
survcompare(
df_train,
predict_factors,
fixed_time = NaN,
randomseed = NaN,
useCoxLasso = FALSE,
outer_cv = 3,
inner_cv = 3,
tuningparams = list(),
return_models = FALSE,
repeat_cv = 2,
ml = "SRF",
use_ensemble = FALSE,
max_grid_size = 10,
suppresswarn = TRUE
)
Arguments
df_train |
training data, a data frame with "time" and "event" columns to define the survival outcome |
predict_factors |
list of column names to be used as predictors |
fixed_time |
prediction time of interest. If NULL, 0.90th quantile of event times is used |
randomseed |
random seed for replication |
useCoxLasso |
TRUE / FALSE, for whether to use regularized version of the Cox model, FALSE is default |
outer_cv |
k in k-fold CV |
inner_cv |
k in k-fold CV for internal CV to tune survival random forest hyper-parameters |
tuningparams |
list of tuning parameters for random forest: 1) NULL for using a default tuning grid, or 2) a list("mtry"=c(...), "nodedepth" = c(...), "nodesize" = c(...)) |
return_models |
TRUE/FALSE to return the trained models; default is FALSE, only performance is returned |
repeat_cv |
if NULL, runs once, otherwise repeats several times with different random split for CV, reports average of all |
ml |
this is currently for Survival Random Forest only ("SRF") |
use_ensemble |
TRUE/FALSE for whether to train SRF on its own, apart from the CoxPH->SRF ensemble. Default is FALSE as there is not much information in SRF itself compared to the ensembled version. |
max_grid_size |
number of random grid searches for model tuning |
suppresswarn |
TRUE/FALSE, TRUE by default |
Value
outcome - cross-validation results for CoxPH, SRF, and an object containing the comparison results
Author(s)
Diana Shamsutdinova diana.shamsutdinova.github@gmail.com
Examples
df <-simulate_nonlinear(100)
predictors <- names(df)[1:4]
srf_params <- list("mtry" = c(2), "nodedepth"=c(25), "nodesize" =c(15))
mysurvcomp <- survcompare(df, predictors, tuningparams = srf_params, max_grid_size = 1)
summary(mysurvcomp)