eval_cond_stats_parity {fairmetrics} | R Documentation |
Examine Conditional Statistical Parity of a Model
Description
This function evaluates conditional statistical parity, which measures fairness by comparing positive prediction rates across sensitive groups within a defined subgroup of the population. This is useful in scenarios where fairness should be evaluated in a more context-specific way—e.g., within a particular hospital unit or age bracket. Conditional statistical parity is a refinement of standard statistical parity. Instead of comparing prediction rates across groups in the entire dataset, it restricts the comparison to a specified subset of the population, defined by a conditioning variable.
Usage
eval_cond_stats_parity(
data,
outcome,
group,
group2,
condition,
probs,
confint = TRUE,
cutoff = 0.5,
bootstraps = 2500,
alpha = 0.05,
message = TRUE,
digits = 2
)
Arguments
data |
Data frame containing the outcome, predicted outcome, and sensitive attribute |
outcome |
Name of the outcome variable, it must be binary |
group |
Name of the sensitive attribute |
group2 |
Name of the group to condition on |
condition |
If the conditional group is categorical, the condition supplied must be a character of the levels to condition on. If the conditional group is continuous, the conditions supplied must be a character containing the sign of the condition and the value to threshold the continuous variable (e.g. "<50", ">50", "<=50", ">=50"). |
probs |
Name of the predicted outcome variable |
confint |
Whether to compute 95% confidence interval, default is TRUE |
cutoff |
Threshold for the predicted outcome, default is 0.5 |
bootstraps |
Number of bootstrap samples, default is 2500 |
alpha |
The 1 - significance level for the confidence interval, default is 0.05 |
message |
Logical; if TRUE (default), prints a textual summary of the
fairness evaluation. Only works if |
digits |
Number of digits to round the results to, default is 2 |
Details
The function supports both categorical and continuous conditioning variables. For continuous variables, you can supply a threshold expression like "<50"
or ">=75"
to the condition
parameter.
Value
A list containing the following elements:
Conditions: The conditions used to calculate the conditional PPR
PPR_Group1: Positive Prediction Rate for the first group
PPR_Group2: Positive Prediction Rate for the second group
PPR_Diff: Difference in Positive Prediction Rate
PPR_Ratio: Ratio in Positive Prediction Rate If confidence intervals are computed (
confint = TRUE
):PPR_Diff_CI: A vector of length 2 containing the lower and upper bounds of the 95% confidence interval for the difference in Positive Prediction Rate
PPR_Ratio_CI: A vector of length 2 containing the lower and upper bounds of the 95% confidence interval for the ratio in Positive Prediction Rate
See Also
Examples
library(fairmetrics)
library(dplyr)
library(magrittr)
library(randomForest)
data("mimic_preprocessed")
set.seed(123)
train_data <- mimic_preprocessed %>%
dplyr::filter(dplyr::row_number() <= 700)
# Fit a random forest model
rf_model <- randomForest::randomForest(factor(day_28_flg) ~ ., data = train_data, ntree = 1000)
# Test the model on the remaining data
test_data <- mimic_preprocessed %>%
dplyr::mutate(gender = ifelse(gender_num == 1, "Male", "Female")) %>%
dplyr::filter(dplyr::row_number() > 700)
test_data$pred <- predict(rf_model, newdata = test_data, type = "prob")[, 2]
# Fairness evaluation
# We will use sex as the sensitive attribute and day_28_flg as the outcome.
# We choose threshold = 0.41 so that the overall FPR is around 5%.
# Evaluate Conditional Statistical Parity
eval_cond_stats_parity(
data = test_data,
outcome = "day_28_flg",
group = "gender",
group2 = "service_unit",
condition = "MICU",
probs = "pred",
cutoff = 0.41
)