cram_simulation {cramR} | R Documentation |
Cram Policy Simulation
Description
This function performs the cram method (simultaneous learning and evaluation) on simulation data, for which the data generation process (DGP) is known. The data generation process for X can be given directly as a function or induced by a provided dataset via row-wise bootstrapping. Results are averaged across Monte Carlo replicates for the given DGP.
Usage
cram_simulation(
X = NULL,
dgp_X = NULL,
dgp_D,
dgp_Y,
batch,
nb_simulations,
nb_simulations_truth = NULL,
sample_size,
model_type = "causal_forest",
learner_type = "ridge",
alpha = 0.05,
baseline_policy = NULL,
parallelize_batch = FALSE,
model_params = NULL,
custom_fit = NULL,
custom_predict = NULL,
propensity = NULL
)
Arguments
X |
Optional. A matrix or data frame of covariates for each sample inducing empirically the DGP for covariates. |
dgp_X |
Optional. A function to generate covariate data for simulations. |
dgp_D |
A vectorized function to generate binary treatment assignments for each sample. |
dgp_Y |
A vectorized function to generate the outcome variable for each sample given the treatment and covariates. |
batch |
Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample. |
nb_simulations |
The number of simulations (Monte Carlo replicates) to run. |
nb_simulations_truth |
Optional. The number of additional simmulations (Monte Carlo replicates) beyond nb_simulations to use when calculating the true policy value difference (delta) and the true policy value (psi) |
sample_size |
The number of samples in each simulation. |
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
alpha |
Significance level for confidence intervals. Default is 0.05 (95% confidence). |
baseline_policy |
A list providing the baseline policy (binary 0 or 1) for each sample.
If |
parallelize_batch |
Logical. Whether to parallelize batch processing
(i.e. the cram method learns T policies,
with T the number of batches. They are learned in parallel
when parallelize_batch is TRUE vs. learned sequentially using
the efficient data.table structure when parallelize_batch is FALSE,
recommended for light weight training). Defaults to |
model_params |
A list of additional parameters to pass to the model,
which can be any parameter defined in the model reference package.
Defaults to |
custom_fit |
A custom, user-defined, function that outputs a fitted model given training data
(allows flexibility). Defaults to |
custom_predict |
A custom, user-defined, function for making predictions given a fitted model
and test data (allow flexibility). Defaults to |
propensity |
The propensity score model |
Value
A list containing:
avg_proportion_treated
The average proportion of treated individuals across simulations.
avg_delta_estimate
The average delta estimate across simulations.
avg_delta_standard_error
The average standard error of delta estimates.
delta_empirical_bias
The empirical bias of delta estimates.
delta_empirical_coverage
The empirical coverage of delta confidence intervals.
avg_policy_value_estimate
The average policy value estimate across simulations.
avg_policy_value_standard_error
The average standard error of policy value estimates.
policy_value_empirical_bias
The empirical bias of policy value estimates.
policy_value_empirical_coverage
The empirical coverage of policy value confidence intervals.
See Also
causal_forest
, cv.glmnet
, keras_model_sequential
Examples
set.seed(123)
# dgp_X <- function(n) {
# data.table::data.table(
# binary = rbinom(n, 1, 0.5),
# discrete = sample(1:5, n, replace = TRUE),
# continuous = rnorm(n)
# )
# }
n <- 100
X_data <- data.table::data.table(
binary = rbinom(n, 1, 0.5),
discrete = sample(1:5, n, replace = TRUE),
continuous = rnorm(n)
)
dgp_D <- function(X) rbinom(nrow(X), 1, 0.5)
dgp_Y <- function(D, X) {
theta <- ifelse(
X[, binary] == 1 & X[, discrete] <= 2, # Group 1: High benefit
1,
ifelse(X[, binary] == 0 & X[, discrete] >= 4, # Group 3: Negative benefit
-1,
0.1) # Group 2: Neutral effect
)
Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) +
(1 - D) * rnorm(length(D)) # Outcome for untreated
return(Y)
}
# Parameters
nb_simulations <- 100
nb_simulations_truth <- 200
batch <- 5
# Perform CRAM simulation
result <- cram_simulation(
X = X_data,
dgp_D = dgp_D,
dgp_Y = dgp_Y,
batch = batch,
nb_simulations = nb_simulations,
nb_simulations_truth = nb_simulations_truth,
sample_size = 500
)
result$raw_results
result$interactive_table