ols_bcm {MLBC} | R Documentation |
Multiplicative bias-corrected OLS (BCM)
Description
Performs a multiplicative bias correction to regressions that include a binary covariate generated by AI/ML. This method requires an external estimate of the false-positive rate. Standard errors are adjusted to account for uncertainty in the false-positive rate estimate.
Usage
ols_bcm(
Y,
Xhat = NULL,
fpr,
m,
data = parent.frame(),
intercept = TRUE,
gen_idx = 1,
...
)
## Default S3 method:
ols_bcm(
Y,
Xhat,
fpr,
m,
data = parent.frame(),
intercept = TRUE,
gen_idx = 1,
...
)
## S3 method for class 'formula'
ols_bcm(
Y,
Xhat = NULL,
fpr,
m,
data = parent.frame(),
intercept = TRUE,
gen_idx = 1,
...
)
Arguments
Y |
numeric response vector, or a one-sided formula |
Xhat |
numeric matrix of regressors (if |
fpr |
numeric; estimated false-positive rate of the ML regressor |
m |
integer; size of the external sample used to estimate the classifier's false-positive rate. Can be set to a large number when the false-positive rate is known exactly |
data |
data frame (if |
intercept |
logical; if |
gen_idx |
integer; 1-based index of the ML-generated variable to apply bias correction to. If not specified, defaults to the first non-intercept variable |
... |
unused |
Value
An object of class mlbc_fit
and mlbc_bcm
with:
-
coef
: bias-corrected coefficient estimates (ML-slope first, other slopes, intercept last) -
vcov
: adjusted variance-covariance matrix for those coefficients
Usage Options
Option 1: Formula Interface
-
Y
: A one-sided formula string -
data
: Data frame containing the variables referenced in the formula
Option 2: Array Interface
-
Y
: Response variable vector -
Xhat
: Design matrix of covariates
Examples
# Load the remote work dataset
data(SD_data)
# Formula interface
fit_bcm <- ols_bcm(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
data = SD_data,
fpr = 0.009, # estimated false positive rate
m = 1000) # validation sample size
summary(fit_bcm)
# Compare with uncorrected OLS
fit_ols <- ols(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
data = SD_data)
# Display coefficient comparison
data.frame(
OLS = coef(fit_ols)[1:2],
BCM = coef(fit_bcm)[1:2]
)