sensitivity_analysis {commecometrics}R Documentation

Perform sensitivity analysis on ecometric models (quantitative environmental variables)

Description

Evaluates how varying sample sizes affect the performance of ecometric models, focusing on two aspects:

It tests different sample sizes by resampling the data multiple times (bootstrap iterations), training an ecometric model on each subset, and evaluating prediction error and correlation.

Usage

sensitivity_analysis(
  points_df,
  env_var,
  sample_sizes,
  iterations = 20,
  test_split = 0.2,
  grid_bins_1 = NULL,
  grid_bins_2 = NULL,
  transform_fun = NULL,
  parallel = TRUE,
  n_cores = parallel::detectCores() - 1
)

Arguments

points_df

Output first element of the list from summarize_traits_by_point(). A data frame with columns: summ_trait_1, summ_trait_2, count_trait, and the environmental variable.

env_var

Name of the environmental variable column in points_df (e.g., "precip").

sample_sizes

Numeric vector specifying the number of communities (sampling points) to evaluate in the sensitivity analysis. For each value, a random subset of the data of that size is drawn without replacement and then split into training and testing sets using the proportion defined by test_split (default is 80% training, 20% testing). All values in sample_sizes must be less than or equal to the number of rows in points_df, and large enough to allow splitting based on test_split (i.e., both the training and testing sets must contain at 30 communities).

iterations

Number of bootstrap iterations per sample size (default: 20).

test_split

Proportion of data to use for testing (default: 0.2).

grid_bins_1

Number of bins for the first trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

grid_bins_2

Number of bins for the second trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

transform_fun

Function to transform the environmental variable (default: NULL = no transformation).

parallel

Logical; whether to use parallel processing (default: TRUE).

n_cores

Number of cores to use for parallel processing (default: parallel::detectCores() - 1).

Details

Four base R plots are generated to visualize model performance as a function of sample size:

  1. Training correlation vs. Sample size: Shows how well the model fits training data.

  2. Testing correlation vs. Sample size: Shows generalizability to new data.

  3. Training mean anomaly vs. Sample size: Shows average prediction error on training data.

  4. Testing mean anomaly vs. Sample size: Shows average prediction error on test data.

Parallel processing is supported to speed up the analysis.

Value

A list containing:

combined_results

A data frame with mean absolute anomalies and correlations for each sample size and iteration.

summary_results

A data frame summarizing the mean anomalies and correlations across sample sizes.

Examples


# Load internal data
data("geoPoints", package = "commecometrics")
data("traits", package = "commecometrics")
data("spRanges", package = "commecometrics")

# Summarize trait values at sampling points
traitsByPoint <- summarize_traits_by_point(
  points_df = geoPoints,
  trait_df = traits,
  species_polygons = spRanges,
  trait_column = "RBL",
  species_name_col = "sci_name",
  continent = FALSE,
  parallel = FALSE
)

# Run sensitivity analysis using annual precipitation
sensitivityResults <- sensitivity_analysis(
  points_df = traitsByPoint$points,
  env_var = "precip",
  sample_sizes = seq(40, 90, 10),
  iterations = 5,
  transform_fun = function(x) log(x + 1),
  parallel = FALSE  # Set to TRUE for faster performance on multicore machines
)

# View results
head(sensitivityResults$summary_results)


[Package commecometrics version 1.0.0 Index]