simulateDataCore {fetwfe} | R Documentation |
Generate Random Panel Data for FETWFE Simulations
Description
Generates a random panel data set for simulation studies of the fused extended two-way fixed
effects (FETWFE) estimator. The function creates a balanced panel with N
units over T
time periods, assigns treatment status across R
treated cohorts (with equal marginal
probabilities for treatment and non-treatment), and constructs a design matrix along with the
corresponding outcome. When gen_ints = TRUE
the full design matrix is returned (including
interactions between covariates and fixed effects and treatment indicators). When
gen_ints = FALSE
the design matrix is generated in a simpler format (with no interactions)
as expected by fetwfe()
. Moreover, the covariates are generated according to the
specified distribution
: by default, covariates are drawn from a normal distribution;
if distribution = "uniform"
, they are drawn uniformly from [-\sqrt{3}, \sqrt{3}]
.
When d = 0
(i.e. no covariates), no covariate-related columns or interactions are
generated.
See the simulation studies section of Faletto (2025) for details.
Usage
simulateDataCore(
N,
T,
R,
d,
sig_eps_sq,
sig_eps_c_sq,
beta,
seed = NULL,
gen_ints = FALSE,
distribution = "gaussian"
)
Arguments
N |
Integer. Number of units in the panel. |
T |
Integer. Number of time periods. |
R |
Integer. Number of treated cohorts (with treatment starting in periods 2 to T). |
d |
Integer. Number of time-invariant covariates. |
sig_eps_sq |
Numeric. Variance of the idiosyncratic (observation-level) noise. |
sig_eps_c_sq |
Numeric. Variance of the unit-level random effects. |
beta |
Numeric vector. Coefficient vector for data generation. Its required length depends
on the value of
|
seed |
(Optional) Integer. Seed for reproducibility. |
gen_ints |
Logical. If |
distribution |
Character. Distribution to generate covariates.
Defaults to |
Details
When gen_ints = TRUE
, the function constructs the design matrix by first generating
base fixed effects and a long-format covariate matrix (via generateBaseEffects()
), then
appending interactions between the covariates and cohort/time fixed effects (via
generateFEInts()
) and finally treatment indicator columns and treatment-covariate
interactions (via genTreatVarsSim()
and genTreatInts()
). When
gen_ints = FALSE
, the design matrix consists only of the base fixed effects, covariates,
and treatment indicators.
The argument distribution
controls the generation of covariates. For
"gaussian"
, covariates are drawn from rnorm
; for "uniform"
,
they are drawn from runif
on the interval [-\sqrt{3}, \sqrt{3}]
.
When d = 0
(i.e. no covariates), the function omits any covariate-related columns
and their interactions.
Value
An object of class "FETWFE_simulated"
, which is a list containing:
- pdata
A dataframe containing generated data that can be passed to
fetwfe()
.- X
The design matrix. When
gen_ints = TRUE
,X
hasp
columns with interactions; whengen_ints = FALSE
,X
has no interactions.- y
A numeric vector of length
N \times T
containing the generated responses.- covs
A character vector containing the names of the generated features (if
d > 0
), or simply an empty vector (ifd = 0
)- time_var
The name of the time variable in pdata
- unit_var
The name of the unit variable in pdata
- treatment
The name of the treatment variable in pdata
- response
The name of the response variable in pdata
- coefs
The coefficient vector
\beta
used for data generation.- first_inds
A vector of indices indicating the first treatment effect for each treated cohort.
- N_UNTREATED
The number of never-treated units.
- assignments
A vector of counts (of length
R+1
) indicating how many units fall into the never-treated group and each of theR
treated cohorts.- indep_counts
Independent cohort assignments (for auxiliary purposes).
- p
The number of columns in the design matrix
X
.- N
Number of units.
- T
Number of time periods.
- R
Number of treated cohorts.
- d
Number of covariates.
- sig_eps_sq
The idiosyncratic noise variance.
- sig_eps_c_sq
The unit-level noise variance.
References
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985.
Examples
## Not run:
# Set simulation parameters
N <- 100 # Number of units in the panel
T <- 5 # Number of time periods
R <- 3 # Number of treated cohorts
d <- 2 # Number of time-invariant covariates
sig_eps_sq <- 1 # Variance of observation-level noise
sig_eps_c_sq <- 0.5 # Variance of unit-level random effects
# Generate coefficient vector using genCoefsCore()
# (Here, density controls sparsity and eff_size scales nonzero entries)
coefs_core <- genCoefsCore(R = R, T = T, d = d, density = 0.2, eff_size = 2, seed = 123)
# Now simulate the data. Setting gen_ints = TRUE generates the full design
matrix with interactions.
sim_data <- simulateDataCore(
N = N,
T = T,
R = R,
d = d,
sig_eps_sq = sig_eps_sq,
sig_eps_c_sq = sig_eps_c_sq,
beta = coefs_core$beta,
seed = 456,
gen_ints = TRUE,
distribution = "gaussian"
)
# Examine the returned list:
str(sim_data)
## End(Not run)