e2tree {e2tree} | R Documentation |
Explainable Ensemble Tree
Description
It creates an explainable tree for Random Forest. Explainable Ensemble Trees (E2Tree) aimed to generate a “new tree” that can explain and represent the relational structure between the response variable and the predictors. This lead to providing a tree structure similar to those obtained for a decision tree exploiting the advantages of a dendrogram-like output.
Usage
e2tree(
formula,
data,
D,
ensemble,
setting = list(impTotal = 0.1, maxDec = 0.01, n = 2, level = 5)
)
Arguments
formula |
is a formula describing the model to be fitted, with a response but no interaction terms. | ||||||||||||
data |
a data frame containing the variables in the model. It is a data frame in which to interpret the variables named in the formula. | ||||||||||||
D |
is the dissimilarity matrix. This is a dissimilarity matrix measuring the discordance between two observations concerning a given classifier of a random forest model. The dissimilarity matrix is obtained with the createDisMatrix function. | ||||||||||||
ensemble |
is an ensemble tree object (for the moment ensemble works only with random forest objects) | ||||||||||||
setting |
is a list containing the set of stopping rules for the tree building procedure.
Default is |
Value
A e2tree object, which is a list with the following components:
tree | A data frame representing the main structure of the tree aimed at explaining and graphically representing the relationships and interactions between the variables used to perform an ensemble method. | |
call | The matched call | |
terms | A list of terms and attributes | |
control | A list containing the set of stopping rules for the tree building procedure | |
varimp | A list containing a table and a plot for the variable importance. Variable importance refers to a quantitative measure that assesses the contribution of individual variables within a predictive model towards accurate predictions. It quantifies the influence or impact that each variable has on the model's overall performance. Variable importance provides insights into the relative significance of different variables in explaining the observed outcomes and aids in understanding the underlying relationships and dynamics within the model |
Examples
## Classification:
data(iris)
# Create training and validation set:
smp_size <- floor(0.75 * nrow(iris))
train_ind <- sample(seq_len(nrow(iris)), size = smp_size)
training <- iris[train_ind, ]
validation <- iris[-train_ind, ]
response_training <- training[,5]
response_validation <- validation[,5]
# Perform training:
## "randomForest" package
ensemble <- randomForest::randomForest(Species ~ ., data=training,
importance=TRUE, proximity=TRUE)
## "ranger" package
ensemble <- ranger::ranger(Species ~ ., data = iris,
num.trees = 1000, importance = 'impurity')
D <- createDisMatrix(ensemble, data=training, label = "Species",
parallel = list(active=FALSE, no_cores = 1))
setting=list(impTotal=0.1, maxDec=0.01, n=2, level=5)
tree <- e2tree(Species ~ ., training, D, ensemble, setting)
## Regression
data("mtcars")
# Create training and validation set:
smp_size <- floor(0.75 * nrow(mtcars))
train_ind <- sample(seq_len(nrow(mtcars)), size = smp_size)
training <- mtcars[train_ind, ]
validation <- mtcars[-train_ind, ]
response_training <- training[,1]
response_validation <- validation[,1]
# Perform training
## "randomForest" package
ensemble = randomForest::randomForest(mpg ~ ., data=training, ntree=1000,
importance=TRUE, proximity=TRUE)
## "ranger" package
ensemble <- ranger::ranger(formula = mpg ~ ., data = training,
num.trees = 1000, importance = "permutation")
D = createDisMatrix(ensemble, data=training, label = "mpg",
parallel = list(active=FALSE, no_cores = 1))
setting=list(impTotal=0.1, maxDec=(1*10^-6), n=2, level=5)
tree <- e2tree(mpg ~ ., training, D, ensemble, setting)