initial_parameter_optimization {topolow} | R Documentation |
Run Parameter Optimization Via Latin Hypercube Sampling
Description
Performs parameter optimization using Latin Hypercube Sampling (LHS) combined with k-fold cross-validation. Parameters are sampled from specified ranges using maximin LHS design to ensure good coverage of parameter space. Each parameter set is evaluated using k-fold cross-validation to assess prediction accuracy. To calculate one NLL per set of parameters, the function uses a pooled errors approach which combine all validation errors into one set, then calculate a single NLL. This approach has two main advantages: 1- It treats all validation errors equally, respecting the underlying error distribution assumption 2- It properly accounts for the total number of validation points
Usage
initial_parameter_optimization(
distance_matrix,
mapping_max_iter = 1000,
relative_epsilon,
convergence_counter,
scenario_name,
N_min,
N_max,
k0_min,
k0_max,
c_repulsion_min,
c_repulsion_max,
cooling_rate_min,
cooling_rate_max,
num_samples = 20,
max_cores = NULL,
folds = 20,
verbose = FALSE,
write_files = FALSE,
output_dir
)
Arguments
distance_matrix |
Matrix or data frame. Input distance matrix. Must be square and symmetric. Can contain NA values for missing measurements. |
mapping_max_iter |
Integer. Maximum number of optimization iterations. |
relative_epsilon |
Numeric. Convergence threshold for relative change in error. |
convergence_counter |
Integer. Number of iterations below threshold before declaring convergence. |
scenario_name |
Character. Name for output files and job identification. |
N_min , N_max |
Integer. Range for number of dimensions parameter. |
k0_min , k0_max |
Numeric. Range for initial spring constant parameter. |
c_repulsion_min , c_repulsion_max |
Numeric. Range for repulsion constant parameter. |
cooling_rate_min , cooling_rate_max |
Numeric. Range for spring decay parameter. |
num_samples |
Integer. Number of LHS samples to generate (default: 20). |
max_cores |
Integer. Maximum number of cores to use for parallel processing. If NULL, uses all available cores minus 1 (default: NULL). |
folds |
Integer. Number of cross-validation folds. Default: 20. |
verbose |
Logical. Whether to print progress messages. Default: FALSE. |
write_files |
Logical. Whether to save results to CSV. Default: FALSE. |
output_dir |
Character. Directory where output files will be saved.
Required if |
Details
The function performs these steps:
Generates LHS samples in parameter space
Creates k-fold splits of input data
For each parameter set and fold:
Trains model on training set
Evaluates on validation set
Calculates MAE and negative log likelihood
Computations are run locally in parallel.
Parameters ranges are transformed to log scale where appropriate to handle different scales effectively.
Value
A data.frame
containing the parameter sets and their performance metrics
(Holdout_MAE
and NLL
). The columns of the data frame are N
, k0
,
cooling_rate
, c_repulsion
, Holdout_MAE
, and NLL
.
If write_files
is TRUE
, this data frame is also saved to a CSV file as a side effect.
See Also
create_topolow_map
for the core optimization algorithm
Examples
# This example is wrapped in \donttest{} because it can exceed 5 seconds,
# 1. Create a structured, synthetic dataset for the example
# Generate coordinates for a more realistic test case
synth_coords <- generate_complex_data(n_points = 20, n_dim = 3)
# Convert coordinates to a distance matrix
dist_mat <- coordinates_to_matrix(synth_coords)
# 2. Run the optimization on the synthetic data
# ensuring it passes CRAN's automated checks.
results <- initial_parameter_optimization(
distance_matrix = dist_mat,
mapping_max_iter = 100,
relative_epsilon = 1e-3,
convergence_counter = 2,
scenario_name = "test_opt_synthetic",
N_min = 2, N_max = 5,
k0_min = 1, k0_max = 10,
c_repulsion_min = 0.001, c_repulsion_max = 0.05,
cooling_rate_min = 0.001, cooling_rate_max = 0.02,
num_samples = 4,
max_cores = 2,
verbose = FALSE
)
print(results)