rcv_d {binaryRL} | R Documentation |
Step 2: Generating fake data for parameter and model recovery
Description
This function is designed for model and parameter recovery of user-created
(black-box) models, provided they conform to the specified interface.
(demo:
TD
,
RSTD
,
Utility
).
The process involves generating synthetic datasets. First, parameters
are randomly sampled within a defined range. These parameters are then
used to simulate artificial datasets.
Subsequently, all candidate models are used to fit these simulated datasets. Model recoverability is assessed if a synthetic dataset generated by Model A is consistently best fitted by Model A itself.
Furthermore, the function allows users to evaluate parameter recoverability. If, for instance, a synthetic dataset generated by Model A was based on parameters like 0.3 and 0.7, and Model A then recovers optimal parameters close to 0.3 and 0.7 from this data, it indicates that the parameters of Model A are recoverable.
The function provides several optimization algorithms:
1. L-BFGS-B (from
stats::optim
)2. Simulated Annealing (
GenSA::GenSA
)3. Genetic Algorithm (
GA::ga
)4. Differential Evolution (
DEoptim::DEoptim
)5. Particle Swarm Optimization (
pso::psoptim
)6. Bayesian Optimization (
mlrMBO::mbo
)7. Covariance Matrix Adapting Evolutionary Strategy (
cmaes::cma_es
)8. Nonlinear Optimization (
nloptr::nloptr
)
For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/
Usage
rcv_d(
data,
id = NULL,
n_trials = NULL,
simulate_models = list(TD, RSTD, Utility),
simulate_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
fit_models = list(TD, RSTD, Utility),
fit_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
fit_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
model_names = c("TD", "RSTD", "Utility"),
funcs = NULL,
initial_params = NA,
initial_size = 50,
iteration_s = 10,
iteration_f = 10,
seed = 1,
nc = 1,
algorithm
)
Arguments
data |
[data.frame] This data should include the following mandatory columns:
|
id |
[vector] Specifies which subject's data to use. In parameter and model recovery analyses, the specific subject ID is often irrelevant. Although the experimental trial order might have some randomness for each subject, the sequence of reward feedback is typically pseudo-random. The default value for this argument is 'NULL'. When 'id' is 'NULL', the program automatically detects existing subject IDs within the dataset. It then randomly selects one subject as a sample, and the parameter and model recovery procedures are performed based on this selected subject's data.
|
n_trials |
[integer] Represents the total number of trials a single subject experienced in the experiment. If this parameter is kept at its default value of 'NULL', the program will automatically detect how many trials a subject experienced from the provided data. This information is primarily used for calculating model fit statistics such as AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion).
|
simulate_models |
[list] A collection of functions used to generate simulated data. |
simulate_lower |
[list] The lower bounds for simulate models |
simulate_upper |
[list] The upper bounds for simulate models |
fit_models |
[list] A collection of functions applied to fit models to the data. |
fit_lower |
[list] The lower bounds for model fit models |
fit_upper |
[list] The upper bounds for model fit models |
model_names |
[list] The names of fit modals |
funcs |
[character] A character vector containing the names of all user-defined functions required for the computation. When parallel computation is enabled (i.e., 'nc > 1'), user-defined models and their custom functions might not be automatically accessible within the parallel environment. Therefore, if you have created your own reinforcement learning model
that modifies the package's default four default functions
(default functions:
|
initial_params |
[numeric]
Initial values for the free parameters that the optimization algorithm will
search from. These are primarily relevant when using algorithms that require
an explicit starting point, such as
|
initial_size |
[integer]
This parameter corresponds to the population size in genetic
algorithms (
|
iteration_s |
[integer] This parameter determines how many simulated datasets are created for subsequent model and parameter recovery analyses. |
iteration_f |
[integer] The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time. |
seed |
[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run.
|
nc |
[integer] Number of cores to use for parallel processing. Since fitting optimal parameters for each subject is an independent task, parallel computation can significantly speed up the fitting process:
|
algorithm |
[character] Choose an algorithm package from 'L-BFGS-B', 'GenSA', 'GA', 'DEoptim', 'PSO', 'Bayesian', 'CMA-ES'. In addition, any algorithm from the 'nloptr' package is also supported. If your chosen 'nloptr' algorithm requires a local search, you need to input a character vector. The first element represents the algorithm used for global search, and the second element represents the algorithm used for local search. |
Value
A list where each element is a data.frame. Each data.frame within this list records the results of fitting synthetic data (generated by Model A) with Model B.
Note
During the parameter fitting process, some algorithms might get stuck in local optima. Alternatively, even a global optimum found could be a "beautiful error." This occurs because black-box functions are often non-differentiable. A very low loss function value at a specific point might result from overfitting noise in trials, with surrounding points having relatively high loss values.
A classic approach to mitigate this is Hierarchical Bayesian modeling, which constrains individual parameters at the group level. Currently, this package does not offer such a solution.
However, we highly recommend using evolutionary algorithms like
DEoptim
or GA
for optimization. These algorithms continue to
explore and weigh options even after finding an apparent optimum. If the
offspring of this optimum still yield superior solutions, it suggests that
these parameters are not extreme values obtained from overfitting noise.
Examples
## Not run:
recovery <- binaryRL::rcv_d(
data = binaryRL::Mason_2024_Exp2,
##-----------------------------------------------------------------------------##
##----------------------------- black-box function ----------------------------##
#funcs = c("your_funcs"),
model_names = c("TD", "RSTD", "Utility"),
simulate_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
simulate_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
fit_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
fit_upper = list(c(1, 5), c(1, 1, 5), c(1, 1, 5)),
##----------------------------- interation number -----------------------------##
iteration_s = 100,
iteration_f = 100,
##-------------------------------- algorithms ---------------------------------##
nc = 1, # <nc > 1>: parallel computation across subjects
# Base R Optimization
algorithm = "L-BFGS-B" # Gradient-Based (stats)
##-----------------------------------------------------------------------------##
# Specialized External Optimization
#algorithm = "GenSA" # Simulated Annealing (GenSA)
#algorithm = "GA" # Genetic Algorithm (GA)
#algorithm = "DEoptim" # Differential Evolution (DEoptim)
#algorithm = "PSO" # Particle Swarm Optimization (pso)
#algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
#algorithm = "CMA-ES" # Covariance Matrix Adapting (cmaes)
##-----------------------------------------------------------------------------##
# Optimization Library (nloptr)
#algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
##-------------------------------- algorithms ---------------------------------##
#################################################################################
)
result <- dplyr::bind_rows(recovery) %>%
dplyr::select(simulate_model, fit_model, iteration, everything())
# Ensure the output directory exists
if (!dir.exists("../OUTPUT")) {
dir.create("../OUTPUT", recursive = TRUE)
}
write.csv(result, file = "../OUTPUT/result_recovery.csv", row.names = FALSE)
## End(Not run)