fit2ts {TwoTimeScales}R Documentation

Fit a smooth hazard model with two time scales

Description

fit2ts() fits a smooth hazard model with two time scales.

Three methods are implemented for the search of the optimal smoothing parameters (and therefore optimal model): a numerical optimization of the AIC or BIC of the model, a search for the minimum AIC or BIC of the model over a grid of log_10 values for the smoothing parameters, and a solution that uses a sparse mixed model representation of the P-spline model to estimate the smoothing parameters. Construction of the B-splines bases and of the penalty matrix is incorporated within the function. If a matrix of covariates is provided, the function will estimate a model with covariates.

Usage

fit2ts(
  data2ts = NULL,
  Y = NULL,
  R = NULL,
  Z = NULL,
  bins = NULL,
  Bbases_spec = list(),
  pord = 2,
  optim_method = c("ucminf", "grid_search", "LMMsolver"),
  optim_criterion = c("aic", "bic"),
  lrho = c(0, 0),
  Wprior = NULL,
  ridge = 0,
  control_algorithm = list(),
  par_gridsearch = list()
)

Arguments

data2ts

(optional) an object of class created by the function prepare_data(). Proving this input is the easiest way to use the function fit2ts. However, the user can also provide the input data together with a list of bins, as explained by the following parameters' descriptions.

Y

A matrix (or 3d-array) of event counts of dimension nu by ns (or nu by ns by n).

R

A matrix (or 3d-array) of exposure times of dimension nu by ns (or nu by ns by n).

Z

(optional) A regression matrix of covariates values of dimensions n by p.

bins

a list with the specification for the bins. This is created by the function prepare_data. If a list prepared externally from such function if provided, it should contain the following elements: * bins_u A vector of bins extremes for the time scale u. * midu A vector with the midpoints of the bins over u. * nu The number of bins over u. * bins_s A vector of bins extremes for the time scale s. * mids A vector with the midpoints of the bins over s. * ns The number of bins over s.

Bbases_spec

A list with the specification for the B-splines basis with the following elements:

  • bdeg The degree of the B-splines basis. Default is 3 (for cubic B-splines).

  • nseg_u The number of segments for the B-splines over u. Default is 10.

  • min_u (optional) The lower limit of the domain of Bu. Default is min(bins_u).

  • max_u (optional) The upper limit of the domain of Bu. Default is max(bins_u).

  • nseg_s The number of segments for the B-splines over s. Default is 10.

  • min_s (optional) The lower limit of the domain of Bs. Default is min(bins_s).

  • max_s (optional) The upper limit of the domain of Bs. Default is max(bins_s).

pord

The order of the penalty. Default is 2.

optim_method

The method to be used for optimization: "ucminf" (default) for the numerical optimization of the AIC (or BIC), "grid_search" for a grid search of the minimum AIC (or BIC) over a grid of log_10(rho_u) and log_10(rho_s) values, or "LMMsolver" to solve the model as sparse linear mixed model using the package LMMsolver.

optim_criterion

The criterion to be used for optimization: "aic" (default) or "bic". BIC penalized model complexity more strongly than AIC, so that its usage is recommended when a smoother fit is preferable (see also Camarda, 2012).

lrho

A vector of two elements if optim_method == "ucminf". Default is c(0,0). A list of two vectors of values for log_10(rho_u) and log_10(rho_s) if optim_method == "grid_search". In the latter case, if a list with two vectors is not provided, a default sequence of values is used for both log_10(rho_u) and log_10(rho_s).

Wprior

An optional matrix of a-priori weights.

ridge

A ridge penalty parameter: default is 0. This is useful when, in some cases the algorithm shows convergence problems. In this case, set to a small number, for example 1e-4.

control_algorithm

A list with optional values for the parameters of the iterative processes:

  • maxiter The maximum number of iteration for the IWSL algorithm. Default is 20.

  • conv_crit The convergence criteria, expressed as difference between estimates at iteration i and i+1. Default is 1e-5.

  • verbose A Boolean. Default is FALSE. If TRUE monitors the iteration process.

  • monitor_ev A Boolean. Default is FALSE. If TRUE monitors the evaluation of the model over the log_10(rho_s) values.

par_gridsearch

A list of parameters for the grid_search:

  • plot_aic A Boolean. Default is FALSE. If TRUE, plot the AIC values over the grid of log_10(rho_u) and log_10(rho_s) values.

  • plot_bic A Boolean. Default is FALSE. If TRUE, plot the BIC values over the grid of log_10(rho_u) and log_10(rho_s) values.

  • return_aic A Boolean. Default is TRUE. Return the AIC values.

  • return_bic A Boolean. Default is TRUE. Return the BIC values.

  • col The color palette to be used for the AIC/BIC plot. Default is grDevices::gray.colors(n=10).

  • plot_contour A Boolean. Default is TRUE. Adds white contour lines to the AIC/BIC plot.

  • mark_optimal A Boolean. Default is TRUE. If the plot of the AIC or BIC values is returned, marks the optimal combination of log_10(rho_u) and log_10(rho_s) in the plot.

  • main_aic The title of the AIC plot. Default is "AIC grid".

  • main_bic The title of the BIC plot. Default is "BIC grid".

Details

Some functions from the R-package LMMsolver are used here. We refer the interested readers to https://biometris.github.io/LMMsolver/ for more details on LMMsolver and its usage.

Value

An object of class haz2ts, or of class haz2tsLMM. For objects of class haz2ts this is

Objects of class haz2tsLMM have a slight different structure. They are a list with:

References

Boer, Martin P. 2023. “Tensor Product P-Splines Using a Sparse Mixed Model Formulation.” Statistical Modelling 23 (5-6): 465–79. https://doi.org/10.1177/1471082X231178591. Carollo, Angela, Paul H. C. Eilers, Hein Putter, and Jutta Gampe. 2023. “Smooth Hazards with Multiple Time Scales.” arXiv Preprint: https://arxiv.org/abs/http://arxiv.org/abs/2305.09342v1

Examples

# Create some fake data - the bare minimum
id <- 1:20
u <- c(5.43, 3.25, 8.15, 5.53, 7.28, 6.61, 5.91, 4.94, 4.25, 3.86, 4.05, 6.86,
       4.94, 4.46, 2.14, 7.56, 5.55, 7.60, 6.46, 4.96)
s <- c(0.44, 4.89, 0.92, 1.81, 2.02, 1.55, 3.16, 6.36, 0.66, 2.02, 1.22, 3.96,
       7.07, 2.91, 3.38, 2.36, 1.74, 0.06, 5.76, 3.00)
ev <- c(1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1)#'

fakedata <- as.data.frame(cbind(id, u, s, ev))
fakedata2ts <- prepare_data(data = fakedata,
                            u = "u",
                            s_out = "s",
                            ev = "ev",
                            ds = .5)
# Fit a fake model - not optimal smoothing
fit2ts(fakedata2ts,
       optim_method = "grid_search",
       lrho = list(seq(1, 1.5, .5), seq(1, 1.5, .5)))
# For more examples please check the vignettes!!! Running more complicated examples
# here would imply longer running times...


[Package TwoTimeScales version 1.0.0 Index]