FactorHet_mbo_control {FactorHet} | R Documentation |
Control for model-based optimization
Description
FactorHet_mbo_control
is used to adjust the settings for the MBO
(model-based optimization). All arguments have default values. This relies
heavily on options from the mlrMBO
package so please see this package for more detailed discussion.
Usage
FactorHet_mbo_control(
mbo_type = c("sparse", "ridge"),
mbo_initialize = "mm_mclust_prob",
mm_init_iterations = NULL,
mbo_range = c(-5, 0),
mbo_method = "regr.bgp",
final_method = "best.predicted",
iters = 11,
mbo_noisy = TRUE,
criterion = c("BIC", "AIC", "GCV", "BIC_group"),
ic_method = c("EM", "IRLS", "free_param"),
se_final = TRUE,
mbo_design = -1.5,
fast_estimation = NULL,
verbose = FALSE
)
Arguments
mbo_type |
A character argument indicating the type of model to
estimate. The default is |
mbo_initialize |
An argument for the initialization method for each MBO
proposal. The default is |
mm_init_iterations |
An integer value of the number of iterations to use
if Murphy/Murphy initialization is used. The default is |
mbo_range |
A vector of numerical values that set the range of values to
consider on |
mbo_method |
A function used to propose new values of the regularization
parameters. See information from |
final_method |
A character argument that determines how the final
regularization parameter should be selected. The default is
|
iters |
A non-negative integer value of the number of proposals to do after initialization. The default is 11. |
mbo_noisy |
A logical value for whether to treat the objective function
as "noisy" for purposes of model-based optimization. The default is
|
criterion |
A character value of the criterion to minimize. Options are
|
ic_method |
A character value for the method for calculating degrees of
freedom: |
se_final |
A logical value for whether standard errors be calculated for
the final model. The default value is |
mbo_design |
An argument for how to design the initial proposals for MBO. The default is -1.5; this and other options are described in "Details". |
fast_estimation |
An argument as to whether a weaker convergence
criterion should be used for MBO. The default is |
verbose |
A logical argument to provide more information on the initial
steps for MBO; the default is |
Details
Initialization: FactorHet_mbo
relies on the same
initialization for each attempt. The default procedure
("mm_mclust_prob"
) is discussed in detail in the appendix of Goplerud
et al. (2025) and builds on Murphy and Murphy (2020). In brief, it
deterministically initializes group memberships using only the moderators
(e.g. using "mclust"
). Using those memberships, it uses an EM
algorithm (with probabilistic assignment, if "prob"
is specified, or
hard assignment otherwise) for a few steps with only the main effects to
update the proposed group memberships. If the warning appears that
"Murphy/Murphy initialization did not fully converge" , this mean that this
initial step did not fully converge. The number of iterations could be
increased using mm_init_iterations
if desired, although benefits are
usually modest beyond the default settings. These memberships are then used
to initialize the model at each proposed regularization value.
The options available are "spectral"
and "mclust"
that use
"spectral"
or "mclust"
on the moderators with no Murphy/Murphy
style tuning. Alternatively, "mm_mclust"
and "mm_spectral"
combine the Murphy/Murphy tuning upon the corresponding initial deterministic
initialization (e.g. spectral or "mclust"
). These use hard assignment
at each step and likely will converge more quickly although a hard initial
assignment may not be desirable. Adding the suffix "_prob"
to the
"mm_*"
options uses a standard (soft-assignment) EM algorithm during
the Murphy/Murphy tuning.
If one wishes to use a custom initialization for MBO, then set
mbo_initialize=NULL
and provide an initialization via
FactorHet_control
. It is strongly advised to use a
deterministic initialization if done manually, e.g. by providing a list of
initial assignment probabilities for each group.
Design of MBO Proposals: The MBO procedure works as follows; there are
some initial proposals that are evaluated in terms of the criterion. Given
those initial proposals, there are iters
attempts to improve the
criterion through methods described in detail in
mlrMBO
(Bischl et al. 2018). A default
of 11 seems to work well, though one can examine visualize_MBO
after estimation to see how the criterion varied across the proposals.
By default, the regularization parameter is assumed to run from -5 to 0 on
the log10 scale, before standardizing by the size of the dataset. We found
this to be reasonable, but it can be adjusted using mbo_range
.
It is possible to calibrate the initial proposals to help the algorithm find
a minimum of the criterion more quickly. This is controlled by
mbo_design
which accepts the following options. Note that a manual
grid search can be provided using the data.frame
option below.
- Scalar:
By default, this is initialized with a scalar (-1.5) that is the log10 of lambda, before standardization as discussed in
FactorHet_control
. For a scalar value, four proposals are generated that start with the scalar value and adjust it based on the level of sparsity of the initial estimated model. This attempts to avoid initializations that are too dense and thus are very slow to estimate, as well as ones that are too sparse.- "random":
If the string "random" is provided, this follows the default settings in
mlrMBO
and generates random proposals.- data.frame:
A custom grid can be provided using a data.frame that has two columns (
"l"
and"y"
)."l"
provides the proposed values on the log10 lambda scale (before standardization). If the corresponding BIC value is known, e.g. from a prior run of the algorithm, the column"y"
should contain this value. If it is unknown, leave the value asNA
and the value will be estimated. Thus, if a manual grid search is desired, this can be done as follows. Create a data.frame with the grid values"l"
and all"y"
as NA. Then, setiters = 0
to do no estimation after the grid search.
Estimation: Typically, estimation proceeds using the same settings for
each MBO proposal and the final model estimated given the best regularization
value (see option final_method
for details). However, if one wishes to
use a lower convergence criterion for the MBO proposals to speed estimation,
this can be done using the fast_estimation
option. This proceeds by
giving a named list with two members "final"
and "fast"
. Each
of these should be a list with two elements "tolerance.logposterior"
and "tolerance.parameters"
with the corresponding convergence
thresholds. "final"
is used for the final model and "fast"
is
used for evaluating all of the MBO proposals.
Value
FactorHet_mbo_control
returns a named list containing the
elements listed in "Arguments".
References
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas and Michel Lang. 2018. "mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions." arxiv preprint: https://arxiv.org/abs/1703.03373
Goplerud, Max, Kosuke Imai, and Nicole E. Pashley. 2025. "Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis." arxiv preprint: https://arxiv.org/abs/2201.01357
Murphy, Keefe and Thomas Brendan Murphy. 2020. "Gaussian Parsimonious Clustering Models with Covariates and a Noise Component." Advances in Data Analysis and Classification 14:293– 325.
Examples
str(FactorHet_mbo_control())