PIE_fit {PIE} | R Documentation |
PIE: Partially Interpretable Model
Description
Partially Interpretable Estimators (PIE), which jointly train an interpretable model and a black-box model to achieve high predictive performance as well as partial model transparency. PIE is designed to attribute a prediction to contribution from individual features via a linear additive model to achieve interpretability while complementing the prediction by a black-box model to boost the predictive performance. Experimental results show that PIE achieves comparable accuracy to the state-of-the-art black-box models on tabular data. In addition, the understandability of PIE is close to linear models as validated via human evaluations.
Usage
PIE_fit(X, y, lasso_group, X_orig, lambda1, lambda2, iter, eta, nrounds, ...)
Arguments
X |
A matrix for the dataset features with numerical splines. |
y |
A vector for the dataset target label. |
lasso_group |
A vector that indicates groups |
X_orig |
A matrix for the dataset features without numerical splines. |
lambda1 |
A numeric number for group lasso penalty. The larger the value, the larger the penalty. |
lambda2 |
A numeric number for black-box model. The larger the value, the larger contribution of XGBoost model. |
iter |
A numeric number for iterations. |
eta |
A numeric number for learning rate of XGBoost model. |
nrounds |
A numeric number for number of rounds of XGBoost model. |
... |
Additional arguments passed to the XGBoost function. |
Details
The PIE_fit function use training dataset to train the PIE model through jointly train an interpretable model and a black-box model to achieve high predictive performance as well as partial model transparency.
Value
An object of class PIE
containing the following components:
Betas |
The coefficient of group lasso model |
Trees |
The coefficients of XGBoost trees |
rrMSE_fit |
A matrix containing the evaluation between group lasso and y, and evaluation between full model and y for each iteration. |
GAM_pred |
A matrix containing the contribution of group lasso in each iteration. |
Tree_pred |
A matrix containing the contribution of XGBoost model in each iteration. |
best_iter |
The number of the best iteration. |
lambda1 |
The |
lambda2 |
The |
Examples
# Load the training data
data("winequality")
# Which columns are numerical?
num_col <- 1:11
# Which columns are categorical?
cat_col <- 12
# Which column is the response?
y_col <- ncol(winequality)
# Data Processing (the first 200 rows are sampled for demonstration)
dat <- data_process(X = as.matrix(winequality[1:200, -y_col]),
y = winequality[1:200, y_col],
num_col = num_col, cat_col = cat_col, y_col = y_col)
# Fit a PIE model
fold <- 1
fit <- PIE_fit(
X = dat$spl_train_X[[fold]],
y = dat$train_y[[fold]],
lasso_group = dat$lasso_group,
X_orig = dat$orig_train_X[[fold]],
lambda1 = 0.01, lambda2 = 0.01, iter = 5, eta = 0.05, nrounds = 200
)