generate_synthetic_datasets {topolow} | R Documentation |
Generate Synthetic Distance Matrices with Missing Data
Description
Creates synthetic distance matrices with controlled levels of missingness and noise
for testing and validating mapping algorithms. Generates multiple datasets with
different dimensionalities and missingness patterns. If output_dir
is provided,
the generated datasets are saved as RDS files.
Usage
generate_synthetic_datasets(
n_dims_list,
seeds,
n_points,
missingness_levels = list(S = 0.67, M = 0.77, L = 0.87),
output_dir = NULL,
prefix = "sim",
save_plots = FALSE
)
Arguments
n_dims_list |
Numeric vector of dimensions to generate data for |
seeds |
Integer vector of random seeds (same length as n_dims_list) |
n_points |
Integer number of points to generate |
missingness_levels |
Named list of missingness percentages (default: list(S=0.67, M=0.77, L=0.87)) |
output_dir |
Character path to directory for saving outputs. If NULL (the default), no files are saved. |
prefix |
Character string to prefix output files (optional) |
save_plots |
Logical whether to save network visualization plots. Requires |
Value
A list containing the generated synthetic data and metadata:
matrices |
A list of generated symmetric distance matrices for each dimension. |
panels |
A list of generated assay panels (non-symmetric matrices) for each dimension. |
metadata |
A |
Examples
# Generate datasets without saving to disk
results <- generate_synthetic_datasets(
n_dims_list = c(2, 3),
seeds = c(123, 456),
n_points = 50
)
# Generate datasets and save to a temporary directory
temp_out_dir <- tempdir()
results_saved <- generate_synthetic_datasets(
n_dims_list = c(2),
seeds = c(123),
n_points = 10,
missingness_levels = list(low=0.5, high=0.8),
output_dir = temp_out_dir,
save_plots = TRUE
)
list.files(temp_out_dir)
# Clean up the directory
unlink(temp_out_dir, recursive = TRUE)