cat_cox_initialization {catalytic} | R Documentation |
Initialization for Catalytic Cox proportional hazards model (COX)
Description
This function prepares and initializes a catalytic Cox proportional hazards model by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model.
Usage
cat_cox_initialization(
formula,
data,
syn_size = NULL,
hazard_constant = NULL,
entry_points = NULL,
x_degree = NULL,
resample_only = FALSE,
na_replace = stats::na.omit
)
Arguments
formula |
A formula specifying the Cox model. Should include response and predictor variables. |
data |
A data frame containing the data for modeling. |
syn_size |
An integer specifying the size of the synthetic dataset to be generated. Default is four times the number of predictor columns. |
hazard_constant |
A constant hazard rate for generating synthetic time data if not using a fitted Cox model. Default is NULL and will calculate in function. |
entry_points |
A numeric vector for entry points of each observation. Default is NULL. |
x_degree |
A numeric vector indicating the degree for polynomial expansion of predictors. Default is 1 for each predictor. |
resample_only |
A logical indicating whether to perform resampling only. Default is FALSE. |
na_replace |
A function to handle NA values in the data. Default is |
Value
A list containing the values of all the input arguments and the following components:
-
Function Information:
-
function_name
: The name of the function, "cat_cox_initialization". -
time_col_name
: The name of the time variable in the dataset. -
status_col_name
: The name of the status variable (event indicator) in the dataset. -
simple_model
: If the formula has no predictors, a constant hazard rate model is used; otherwise, a fitted Cox model object.
-
-
Observation Data Information:
-
obs_size
: Number of observations in the original dataset. -
obs_data
: Data frame of standardized observation data. -
obs_x
: Predictor variables for observed data. -
obs_time
: Observed survival times. -
obs_status
: Event indicator for observed data.
-
-
Synthetic Data Information:
-
syn_size
: Number of synthetic observations generated. -
syn_data
: Data frame of synthetic predictor and response variables. -
syn_x
: Synthetic predictor variables. -
syn_time
: Synthetic survival times. -
syn_status
: Event indicator for synthetic data (defaults to 1). -
syn_x_resample_inform
: Information about resampling methods for synthetic predictors:Coordinate: Preserves the original data values as reference coordinates during processing.
Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.
Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.
Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.
Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.
-
-
Whole Data Information:
-
size
: Total number of combined original and synthetic observations. -
data
: Data frame combining original and synthetic datasets. -
x
: Combined predictor variables from original and synthetic data. -
time
: Combined survival times from original and synthetic data. -
status
: Combined event indicators from original and synthetic data.
-
Examples
library(survival)
data("cancer")
cancer$status[cancer$status == 1] <- 0
cancer$status[cancer$status == 2] <- 1
cat_init <- cat_cox_initialization(
formula = Surv(time, status) ~ 1, # formula for simple model
data = cancer,
syn_size = 100, # Synthetic data size
hazard_constant = NULL, # Hazard rate value
entry_points = rep(0, nrow(cancer)), # Entry points of each observation
x_degree = rep(1, ncol(cancer) - 2), # Degrees for polynomial expansion of predictors
resample_only = FALSE, # Whether to perform resampling only
na_replace = stats::na.omit # How to handle NA values in data
)
cat_init