sensitivity_analysis_qual {commecometrics}R Documentation

Perform sensitivity analysis on ecometric models (qualitative environmental variables)

Description

Evaluates how varying sample sizes affect the performance of ecometric models, focusing on two aspects:

It tests different sample sizes by resampling the data multiple times (bootstrap iterations), training an ecometric model on each subset, and evaluating prediction error and correlation.

Usage

sensitivity_analysis_qual(
  points_df,
  category_col,
  sample_sizes,
  iterations = 20,
  test_split = 0.2,
  grid_bins_1 = NULL,
  grid_bins_2 = NULL,
  parallel = TRUE,
  n_cores = parallel::detectCores() - 1
)

Arguments

points_df

Output first element of the list from summarize_traits_by_point(). A data frame with columns: summ_trait_1, summ_trait_2, count_trait, and the environmental variable.

category_col

Name of the column containing the categorical trait.

sample_sizes

Numeric vector specifying the number of communities (sampling points) to evaluate in the sensitivity analysis. For each value, a random subset of the data of that size is drawn without replacement and then split into training and testing sets using the proportion defined by test_split (default is 80% training, 20% testing). All values in sample_sizes must be less than or equal to the number of rows in points_df, and large enough to allow splitting based on test_split (i.e., both the training and testing sets must contain at 30 communities).

iterations

Number of bootstrap iterations per sample size (default = 20).

test_split

Proportion of data to use for testing (default = 0.2).

grid_bins_1

Number of bins for the first trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

grid_bins_2

Number of bins for the second trait axis. If NULL (default), the number is calculated automatically using Scott's rule via optimal_bins().

parallel

Logical; whether to run iterations in parallel (default = TRUE).

n_cores

Number of cores for parallelization (default = detectCores() - 1).

Details

Two plots are generated:

  1. Training Accuracy vs. Sample size: Reflects internal model consistency.

  2. Testing Accuracy vs. Sample size: Reflects external model performance.

Parallel processing is supported to speed up the analysis.

Value

A list containing:

combined_results

All raw iteration results.

summary_results

Mean accuracy per sample size.

Examples


# Load internal data
data("geoPoints", package = "commecometrics")
data("traits", package = "commecometrics")
data("spRanges", package = "commecometrics")

# Summarize trait values at sampling points
traitsByPoint <- summarize_traits_by_point(
  points_df = geoPoints,
  trait_df = traits,
  species_polygons = spRanges,
  trait_column = "RBL",
  species_name_col = "sci_name",
  continent = FALSE,
  parallel = FALSE
)

# Run sensitivity analysis for dominant land cover class
sensitivityQual <- sensitivity_analysis_qual(
  points_df = traitsByPoint$points,
  category_col = "vegetation",
  sample_sizes = seq(40, 90, 10),
  iterations = 5,
  parallel = FALSE
)

# View results
head(sensitivityQual$summary_results)


[Package commecometrics version 1.0.0 Index]