sensitivity_analysis {MLwrap} | R Documentation |
Perform Sensitivity Analysis and Interpretable ML methods
Description
As the final step in the MLwrap package workflow, this function performs Sensitivity Analysis (SA) on a fitted
ML model stored in an analysis_object
(in the examples, e.g., tidy_object). It evaluates the importance
of features using various methods such as Permutation Feature Importance (PFI), SHAP (SHapley Additive
exPlanations), Integrated Gradients, Olden sensitivity analysis, and Sobol indices. The function generates
numerical results and visualizations (e.g., bar plots, box plots, beeswarm plots) to help interpret the impact
of each feature on the model's predictions for both regression and classification tasks, providing critical
insights after model training and evaluation.
Usage
sensitivity_analysis(analysis_object, methods = c("PFI"), metric = NULL)
Arguments
analysis_object |
analysis_object created from fine_tuning function. |
methods |
Method to be used. A string of the method name: "PFI" (Permutation Feature Importance), "SHAP" (SHapley Additive exPlanations), "Integrated Gradients" (Neural Network only), "Olden" (Neural Network only), "Sobol_Jansen" (only when all input features are continuous). |
metric |
Metric used for "PFI" method (Permutation Feature Importance). A string of the name of metric (see Metrics). |
Details
Following the steps of data preprocessing, model fitting, and performance assessment in the MLwrap pipeline, sensitivity_analysis() processes the training and test data using the preprocessing recipe stored in the analysis_object, applies the specified SA methods, and stores the results within the analysis_object. It supports different metrics for evaluation and handles multi-class classification by producing class-specific analyses and plots, ensuring a comprehensive understanding of model behavior (Iooss & Lemaître, 2015).
As the concluding phase of the MLwrap workflow—after data preparation, model training, and evaluation—this
function enables users to interpret their models by quantifying and visualizing feature importance. It first
validates the input arguments using check_args_sensitivity_analysis()
. Then, it preprocesses the training
and test data using the recipe stored in analysis_object$transformer
. Depending on the specified methods
,
it calculates feature importance using:
-
PFI (Permutation Feature Importance): Assesses importance by shuffling feature values and measuring the change in model performance (using the specified or default
metric
). -
SHAP (SHapley Additive exPlanations): Computes SHAP values to explain individual predictions by attributing contributions to each feature.
-
Integrated Gradients: Evaluates feature importance by integrating gradients of the model's output with respect to input features.
-
Olden: Calculates sensitivity based on connection weights, typically for neural network models, to determine feature contributions.
-
Sobol_Jansen: Performs variance-based global sensitivity analysis by decomposing the model output variance into contributions from individual features and their interactions, quantifying how much each feature and combination of features accounts for the variability in predictions. Only for continuous outcomes, not for categorical. Specifically, estimates first-order and total-order Sobol' sensitivity indices simultaneously using the Jansen (1999) Monte Carlo estimator.
For classification tasks with more than two outcome levels, the function generates separate results and plots
for each class. Visualizations include bar plots for importance metrics, box plots for distribution of values,
and beeswarm plots for detailed feature impact across observations. All results are stored in the analysis_object
under the sensitivity_analysis
slot, finalizing the MLwrap pipeline with a deep understanding of model drivers.
Value
An updated analysis_object
with the results of the sensitivity analysis stored in the
sensitivity_analysis
slot as a list. Each method's results are accessible under named elements
(e.g., sensitivity_analysis[["PFI"]]
). Additionally, the function produces various plots (bar plots,
box plots, beeswarm plots) for visual interpretation of feature importance, tailored to the task type
and number of outcome levels, completing the MLwrap workflow with actionable model insights.
References
Iooss, B., & Lemaître, P. (2015). A review on global sensitivity analysis methods. In C. Meloni & G. Dellino (Eds.), Uncertainty Management in Simulation-Optimization of Complex Systems: Algorithms and Applications (pp. 101-122). Springer. https://doi.org/10.1007/978-1-4899-7547-8_5
Jansen, M. J. W. (1999). Analysis of variance designs for model output. Computer Physics Communications, 117(1-2), 35–43. https://doi.org/10.1016/S0010-4655(98)00154-4
Examples
# Example: Using PFI and SHAP
library(MLwrap)
data(sim_data) # sim_data is a simulated dataset with psychological variables
wrap_object <- preprocessing(
df = sim_data,
formula = psych_well ~ depression + emot_intel + resilience + life_sat,
task = "regression"
)
wrap_object <- build_model(
analysis_object = wrap_object,
model_name = "Random Forest",
hyperparameters = list(
mtry = 3,
trees = 20
)
)
wrap_object <- fine_tuning(wrap_object,
tuner = "Grid Search CV",
metrics = c("rmse")
)
wrap_object <- sensitivity_analysis(wrap_object, methods = "SHAP")
# Extracting Results
table_shap <- table_shap_results(wrap_object)
# Plotting SHAP Results
wrap_object %>%
plot_shap()