stackG {survML} | R Documentation |
Estimate a conditional survival function using global survival stacking
Description
Estimate a conditional survival function using global survival stacking
Usage
stackG(
time,
event = rep(1, length(time)),
entry = NULL,
X,
newX = NULL,
newtimes = NULL,
direction = "prospective",
time_grid_fit = NULL,
bin_size = NULL,
time_basis,
time_grid_approx = sort(unique(time)),
surv_form = "PI",
learner = "SuperLearner",
SL_control = list(SL.library = c("SL.mean"), V = 10, method = "method.NNLS", stratifyCV
= FALSE),
tau = NULL
)
Arguments
time |
|
event |
|
entry |
Study entry variable, if applicable. Defaults to |
X |
|
newX |
|
newtimes |
|
direction |
Whether the data come from a prospective or retrospective study.
This determines whether the data are treated as subject to left truncation and
right censoring ( |
time_grid_fit |
Named list of numeric vectors of times of times on which to discretize
for estimation of cumulative probability functions. This is an alternative to
|
bin_size |
Size of time bin on which to discretize for estimation
of cumulative probability functions. Can be a number between 0 and 1,
indicating the size of quantile grid (e.g. |
time_basis |
How to treat time for training the binary
classifier. Options are |
time_grid_approx |
Numeric vector of times at which to
approximate product integral or cumulative hazard interval.
Defaults to |
surv_form |
Mapping from hazard estimate to survival estimate.
Can be either |
learner |
Which binary regression algorithm to use. Currently, only
|
SL_control |
Named list of parameters controlling the Super Learner fitting
process. These parameters are passed directly to the |
tau |
The maximum time of interest in a study, used for
retrospective conditional survival estimation. Rather than dealing
with right truncation separately than left truncation, it is simpler to
estimate the survival function of |
Value
A named list of class stackG
, with the following components:
S_T_preds |
An |
S_C_preds |
An |
Lambda_T_preds |
An |
Lambda_C_preds |
An |
time_grid_approx |
The approximation grid for the product integral or cumulative hazard integral, (user-specified). |
direction |
Whether the data come from a prospective or retrospective study (user-specified). |
tau |
The maximum time of interest in a study, used for retrospective conditional survival estimation (user-specified). |
surv_form |
Exponential or product-integral form (user-specified). |
time_basis |
Whether time is included in the regression as |
SL_control |
Named list of parameters controlling the Super Learner fitting process (user-specified). |
fits |
A named list of fitted regression objects corresponding to the constituent regressions needed for
global survival stacking. Includes |
References
Wolock C.J., Gilbert P.B., Simon N., and Carone, M. (2024). "A framework for leveraging machine learning tools to estimate personalized survival curves."
See Also
predict.stackG for stackG
prediction method.
Examples
# This is a small simulation example
set.seed(123)
n <- 250
X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5))
S0 <- function(t, x){
pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE)
}
T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2]))
G0 <- function(t, x) {
as.numeric(t < 15) *.9*pexp(t,
rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]),
lower.tail=FALSE)
}
C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2]))
C[C > 15] <- 15
entry <- runif(n, 0, 15)
time <- pmin(T, C)
event <- as.numeric(T <= C)
sampled <- which(time >= entry)
X <- X[sampled,]
time <- time[sampled]
event <- event[sampled]
entry <- entry[sampled]
# Note that this a very small Super Learner library, for computational purposes.
SL.library <- c("SL.mean", "SL.glm")
fit <- stackG(time = time,
event = event,
entry = entry,
X = X,
newX = X,
newtimes = seq(0, 15, .1),
direction = "prospective",
bin_size = 0.1,
time_basis = "continuous",
time_grid_approx = sort(unique(time)),
surv_form = "exp",
learner = "SuperLearner",
SL_control = list(SL.library = SL.library,
V = 5))
plot(fit$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,]))
abline(0,1,col='red')