FactorHet {FactorHet}R Documentation

Estimate heterogeneous effects in factorial and conjoint experiments

Description

Fit a model to estimate heterogeneous effects in factorial or conjoint experiments using a "mixture of experts" (i.e. a finite mixture of regularized regressions with covariates affecting group assignment). Effects are regularized using an overlapping group LASSO. FactorHet_mbo finds an optimal lambda via Bayesian optimization whereas FactorHet requires a lambda to be provided. FactorHet_mbo typically used in practice.

Usage

FactorHet(
  formula,
  design,
  K,
  lambda,
  moderator = NULL,
  group = NULL,
  task = NULL,
  choice_order = NULL,
  weights = NULL,
  control = FactorHet_control(),
  initialize = FactorHet_init(),
  verbose = TRUE
)

FactorHet_mbo(
  formula,
  design,
  K,
  moderator = NULL,
  weights = NULL,
  group = NULL,
  task = NULL,
  choice_order = NULL,
  control = FactorHet_control(),
  initialize = FactorHet_init(),
  mbo_control = FactorHet_mbo_control()
)

Arguments

formula

Formula specifying model. The syntax is y ~ X1 + X2 where y is the outcome and X1 and X2 are factors. Interactions can be specified using * syntax. All main factors must be explicitly included.

design

A data.frame containing the data to be analyzed.

K

An integer specifying the number of groups; K=1 specifies a model with a single group.

lambda

A positive numeric value denoting regularization strength; this is scaled internally by the number of observations, see FactorHet_control. FactorHet_mbo calibrates through model-based optimization. "Details" provides more discussion of this approach.

moderator

A formula of variables (moderators) that affect the prior probability of group membership. This is ignored when K=1 or moderator=NULL.

group

A formula of a single variable, e.g. ~ person_id, that is used when there are repeated observations per individual.

task

A formula of a single variable that indicates the task number performed by each individual. This is not used when group is unspecified.

choice_order

A formula of a single variable that indicates which profile is on the "left" or "right" in a conjoint experiment.

weights

A formula of a single variable that indicates the weights for each observation (e.g., survey weights). If group is specified, the weights must be constant inside of each value of group.

control

An object from FactorHet_control that sets various model estimation options.

initialize

An object from FactorHet_init that determines how the model is initialized.

verbose

A logical value that prints intermediate information about model fitting. The default is TRUE.

mbo_control

A list of control parameters for MBO; see FactorHet_mbo_control for more information.

Details

Caution: Many settings in FactorHet_control can be modified to allow for slight variations in how the model is estimated. Some of these are faster but may introduce numerical differences across versions of R and machines. The default settings aim to mitigate this. One of the default settings (FactorHet_control(step_SQUAREM=NULL)) considerably increases the speed of convergence and the quality of the optimum located at the expense of sometimes introducing numerical differences across machines. To address this, one could not use SQUAREM (do_SQUAREM=FALSE) or set it to use some fixed step-size (e.g., step_SQUAREM=-10). If SQUAREM produces a large step, a message to this effect will be issued.

Factorial vs. Conjoint Experiment: A factorial experiment, i.e. without a forced-choice between profiles, can be modeled by ignoring the choice_order argument and ensuring that each group and task combination corresponds to exactly one observation in the design.

Estimation: All models are estimated using an AECM algorithm described in Goplerud et al. (2025). Calibration of the amount of regularization (i.e. choosing \lambda), should be done using FactorHet_mbo. This uses a small number (default 15) of attempts to calibrate the amount of regularization by minimizing a user-specific criterion (defaulting to the BIC), and then fits a final model using the \lambda that is predicted to minimize the criterion.

Options for the model based optimization (mbo) can be set using FactorHet_mbo_control. Options for model estimation can be set using FactorHet_control.

Ridge Regression: While more experimental, ridge regression can be estimated by setting lambda = 0 (in FactorHet) and then setting prior_var_beta in FactorHet_control or by using FactorHet_mbo and setting mbo_type = "ridge".

Moderators: Moderators can be provided via the moderator argument. These are important when K > 1 for ensuring the stability of the model. Repeated observations per individual can be specified by group and/or task if relevant for a force-choice conjoint.

Value

Returns an object of class FactorHet. Typical use will involve examining the patterns of estimated treatment effects. cjoint_plot shows the raw (logistic) coefficients.

Marginal effects of treatments (e.g. average marginal effects) can be computed using AME, ACE, or AMIE.

The impact of moderators on group membership can be examined using margeff_moderators or posterior_by_moderators.

The returned object is a list containing the following elements:

parameters:

Estimated model parameters. These are usually obtained via coef.FactorHet.

K:

The number of groups

posterior:

Posterior group probability for each observation. This is list of two data.frames one with posterior probabilities ("posterior") and one ("posterior_predictive") implied solely by the moderators, i.e. \pi_{k}(X_i) from Goplerud et al. (2025).

information_criterion:

Information on the BIC, degrees of freedom, log-likelihood, and number of iterations.

internal_parameters:

A list of many internal parameters. This is used for debugging or by other post-estimation functions.

vcov:

Named list containing the estimated variance-covariance matrix. This is usually extracted with vcov.

lp_shortEM:

If "short EM" is applied (only applicable if FactorHet, not FactorHet_mbo, is used), it lists the log-posterior at the end of each short run.

MBO:

If FactorHet_mbo is used, information about the model-based optimization (MBO) is stored here. visualize_MBO provides a quick graphical summary of the BIC at different \lambda.

Examples

# Use a small subset of the immigration data from Hainmueller and Hopkins
data(immigration)

set.seed(1)
# Fit with two groups and tune regularization via MBO
fit_MBO <- FactorHet_mbo(
  formula = Chosen_Immigrant ~ Country + Ed + Gender + Plans,
  design = immigration, group = ~ CaseID,
  task =  ~ contest_no, choice_order = ~ choice_id,
  # Only do one guess after initialization for speed
  mbo_control = FactorHet_mbo_control(iters = 1),
  K = 2)
# Plot the raw coefficients
cjoint_plot(fit_MBO)
# Check how MBO fared at calibrating regularization
visualize_MBO(fit_MBO)
# Visualize posterior distribution of group membership
posterior_FactorHet(fit_MBO)
# Get AMEs
AME(fit_MBO)


[Package FactorHet version 1.0.0 Index]