e2tree {e2tree}R Documentation

Explainable Ensemble Tree

Description

It creates an explainable tree for Random Forest. Explainable Ensemble Trees (E2Tree) aimed to generate a “new tree” that can explain and represent the relational structure between the response variable and the predictors. This lead to providing a tree structure similar to those obtained for a decision tree exploiting the advantages of a dendrogram-like output.

Usage

e2tree(
  formula,
  data,
  D,
  ensemble,
  setting = list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5)
)

Arguments

formula

is a formula describing the model to be fitted, with a response but no interaction terms.

data

a data frame containing the variables in the model. It is a data frame in which to interpret the variables named in the formula.

D

is the dissimilarity matrix. This is a dissimilarity matrix measuring the discordance between two observations concerning a given classifier of a random forest model. The dissimilarity matrix is obtained with the createDisMatrix function.

ensemble

is an ensemble tree object (for the moment ensemble works only with random forest objects)

setting

is a list containing the set of stopping rules for the tree building procedure.

impTotal The threshold for the impurity in the node
maxDec The threshold for the maximum impurity decrease of the node
n The minimum number of the observations in the node
level The maximum depth of the tree (levels)

Default is setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5).

Value

A e2tree object, which is a list with the following components:

tree A data frame representing the main structure of the tree aimed at explaining and graphically representing the relationships and interactions between the variables used to perform an ensemble method.
call The matched call
terms A list of terms and attributes
control A list containing the set of stopping rules for the tree building procedure
varimp A list containing a table and a plot for the variable importance. Variable importance refers to a quantitative measure that assesses the contribution of individual variables within a predictive model towards accurate predictions. It quantifies the influence or impact that each variable has on the model's overall performance. Variable importance provides insights into the relative significance of different variables in explaining the observed outcomes and aids in understanding the underlying relationships and dynamics within the model

Examples


## Classification:
data(iris)

# Create training and validation set:
smp_size <- floor(0.75 * nrow(iris))
train_ind <- sample(seq_len(nrow(iris)), size = smp_size)
training <- iris[train_ind, ]
validation <- iris[-train_ind, ]
response_training <- training[,5]
response_validation <- validation[,5]

# Perform training:
## "randomForest" package
ensemble <- randomForest::randomForest(Species ~ ., data=training, 
importance=TRUE, proximity=TRUE)

## "ranger" package
ensemble <- ranger::ranger(Species ~ ., data = iris, 
num.trees = 1000, importance = 'impurity')

D <- createDisMatrix(ensemble, data=training, label = "Species", 
                              parallel = list(active=FALSE, no_cores = 1))

setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5)
tree <- e2tree(Species ~ ., training, D, ensemble, setting)



## Regression
data("mtcars")

# Create training and validation set:
smp_size <- floor(0.75 * nrow(mtcars))
train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size)
training <- mtcars[train_ind, ]
validation <- mtcars[-train_ind, ]
response_training <- training[,1]
response_validation <- validation[,1]

# Perform training
## "randomForest" package
ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000, 
importance=TRUE, proximity=TRUE)

## "ranger" package
ensemble <- ranger::ranger(formula = mpg ~ ., data = training, 
num.trees = 1000, importance = "permutation")

D = createDisMatrix(ensemble, data=training, label = "mpg", 
                               parallel = list(active=FALSE, no_cores = 1))  

setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5)
tree <- e2tree(mpg ~ ., training, D, ensemble, setting)




[Package e2tree version 0.2.0 Index]