PIE_fit {PIE}R Documentation

PIE: Partially Interpretable Model

Description

Partially Interpretable Estimators (PIE), which jointly train an interpretable model and a black-box model to achieve high predictive performance as well as partial model transparency. PIE is designed to attribute a prediction to contribution from individual features via a linear additive model to achieve interpretability while complementing the prediction by a black-box model to boost the predictive performance. Experimental results show that PIE achieves comparable accuracy to the state-of-the-art black-box models on tabular data. In addition, the understandability of PIE is close to linear models as validated via human evaluations.

Usage

PIE_fit(X, y, lasso_group, X_orig, lambda1, lambda2, iter, eta, nrounds, ...)

Arguments

X

A matrix for the dataset features with numerical splines.

y

A vector for the dataset target label.

lasso_group

A vector that indicates groups

X_orig

A matrix for the dataset features without numerical splines.

lambda1

A numeric number for group lasso penalty. The larger the value, the larger the penalty.

lambda2

A numeric number for black-box model. The larger the value, the larger contribution of XGBoost model.

iter

A numeric number for iterations.

eta

A numeric number for learning rate of XGBoost model.

nrounds

A numeric number for number of rounds of XGBoost model.

...

Additional arguments passed to the XGBoost function.

Details

The PIE_fit function use training dataset to train the PIE model through jointly train an interpretable model and a black-box model to achieve high predictive performance as well as partial model transparency.

Value

An object of class PIE containing the following components:

Betas

The coefficient of group lasso model

Trees

The coefficients of XGBoost trees

rrMSE_fit

A matrix containing the evaluation between group lasso and y, and evaluation between full model and y for each iteration.

GAM_pred

A matrix containing the contribution of group lasso in each iteration.

Tree_pred

A matrix containing the contribution of XGBoost model in each iteration.

best_iter

The number of the best iteration.

lambda1

The lambda1 tuning parameter used in PIE.

lambda2

The lambda2 tuning parameter used in PIE.

Examples


# Load the training data
data("winequality")

# Which columns are numerical?
num_col <- 1:11
# Which columns are categorical?
cat_col <- 12
# Which column is the response?
y_col <- ncol(winequality)

# Data Processing (the first 200 rows are sampled for demonstration)
dat <- data_process(X = as.matrix(winequality[1:200, -y_col]), 
  y = winequality[1:200, y_col], 
  num_col = num_col, cat_col = cat_col, y_col = y_col)

# Fit a PIE model
fold <- 1
fit <- PIE_fit(
  X = dat$spl_train_X[[fold]],
  y = dat$train_y[[fold]],
  lasso_group = dat$lasso_group,
  X_orig = dat$orig_train_X[[fold]],
  lambda1 = 0.01, lambda2 = 0.01, iter = 5, eta = 0.05, nrounds = 200
)


[Package PIE version 1.0.0 Index]