sensitivity_analysis {commecometrics} | R Documentation |
Perform sensitivity analysis on ecometric models (quantitative environmental variables)
Description
Evaluates how varying sample sizes affect the performance of ecometric models, focusing on two aspects:
-
Sensitivity (internal consistency): How accurately the model predicts environmental conditions on the same data it was trained on.
-
Transferability (external applicability): How well the model performs on unseen data.
It tests different sample sizes by resampling the data multiple times (bootstrap iterations), training an ecometric model on each subset, and evaluating prediction error and correlation.
Usage
sensitivity_analysis(
points_df,
env_var,
sample_sizes,
iterations = 20,
test_split = 0.2,
grid_bins_1 = NULL,
grid_bins_2 = NULL,
transform_fun = NULL,
parallel = TRUE,
n_cores = parallel::detectCores() - 1
)
Arguments
points_df |
Output first element of the list from |
env_var |
Name of the environmental variable column in points_df (e.g., "precip"). |
sample_sizes |
Numeric vector specifying the number of communities (sampling points)
to evaluate in the sensitivity analysis. For each value, a random subset of the data of that
size is drawn without replacement and then split into training and testing sets using the
proportion defined by |
iterations |
Number of bootstrap iterations per sample size (default: 20). |
test_split |
Proportion of data to use for testing (default: 0.2). |
grid_bins_1 |
Number of bins for the first trait axis. If |
grid_bins_2 |
Number of bins for the second trait axis. If |
transform_fun |
Function to transform the environmental variable (default: NULL = no transformation). |
parallel |
Logical; whether to use parallel processing (default: TRUE). |
n_cores |
Number of cores to use for parallel processing (default: parallel::detectCores() - 1). |
Details
Four base R plots are generated to visualize model performance as a function of sample size:
-
Training correlation vs. Sample size: Shows how well the model fits training data.
-
Testing correlation vs. Sample size: Shows generalizability to new data.
-
Training mean anomaly vs. Sample size: Shows average prediction error on training data.
-
Testing mean anomaly vs. Sample size: Shows average prediction error on test data.
Parallel processing is supported to speed up the analysis.
Value
A list containing:
combined_results |
A data frame with mean absolute anomalies and correlations for each sample size and iteration. |
summary_results |
A data frame summarizing the mean anomalies and correlations across sample sizes. |
Examples
# Load internal data
data("geoPoints", package = "commecometrics")
data("traits", package = "commecometrics")
data("spRanges", package = "commecometrics")
# Summarize trait values at sampling points
traitsByPoint <- summarize_traits_by_point(
points_df = geoPoints,
trait_df = traits,
species_polygons = spRanges,
trait_column = "RBL",
species_name_col = "sci_name",
continent = FALSE,
parallel = FALSE
)
# Run sensitivity analysis using annual precipitation
sensitivityResults <- sensitivity_analysis(
points_df = traitsByPoint$points,
env_var = "precip",
sample_sizes = seq(40, 90, 10),
iterations = 5,
transform_fun = function(x) log(x + 1),
parallel = FALSE # Set to TRUE for faster performance on multicore machines
)
# View results
head(sensitivityResults$summary_results)