PrInDTAll {PrInDT} | R Documentation |
Conditional inference tree (ctree) based on all observations
Description
ctree based on all observations in 'datain'. Interpretability is checked (see 'ctestv'); probability threshold can be specified. The parameters 'conf.level', 'minsplit', and 'minbucket' can be used to control the size of the trees.
Reference
Weihs, C., Buschfeld, S. 2021a. Combining Prediction and Interpretation in Decision Trees (PrInDT) -
a Linguistic Example. arXiv:2103.02336
In the case of repeated measurements ('indrep=1'), the values of the substructure variable have to be given in 'repvar'. Only one value of 'classname' is allowed for each value of 'repvar'. If for a value of 'repvar' the percentage 'thr' of the observed occurence of a value of 'classname' is not reached by the number of predictions of the value of 'classname', a misclassification is detected.
Usage
PrInDTAll(datain, classname, ctestv=NA, conf.level=0.95, thres=0.5,
minsplit=NA,minbucket=NA,repvar=NA,indrep=0,thr=0.5)
Arguments
datain |
Input data frame with class factor variable 'classname' and the |
classname |
Name of class variable (character) |
ctestv |
Vector of character strings of forbidden split results; |
conf.level |
(1 - significance level) in function |
thres |
Probability threshold for prediction of smaller class (numerical, >= 0 and < 1); default = 0.5 |
minsplit |
Minimum number of elements in a node to be splitted; |
minbucket |
Minimum number of elements in a node; |
repvar |
Values of variable defining the substructure in the case of repeated measurements; default=NA |
indrep |
Indicator of repeated measurements ('indrep=1'); default = 0 |
thr |
threshold for element classification: minimum percentage of correct class entries; default = 0.5 |
Details
Standard output can be produced by means of print(name)
or just name
as well as plot(name)
where 'name' is the output data
frame of the function.
Value
- treeall
ctree based on all observations
- baAll
balanced accuracy of 'treeall'
- interpAll
criterion of interpretability of 'treeall' (TRUE / FALSE)
- confAll
confusion matrix of 'treeall'
- acc1AE
Accuracy of full sample tree on Elements of large class
- acc2AE
Accuracy of full sample tree on Elements of small class
- bamaxAE
balanced accuracy of full sample tree on Elements
- namA1
Names of misclassified Elements by full sample tree of large class
- namA2
Names of misclassified Elements by full sample tree of small class
- lablarge
Label of large class
- labsmall
Label of small class
- thr
Threshold for repeated measurements
Examples
datastrat <- PrInDT::data_zero
data <- na.omit(datastrat)
ctestv <- rbind('ETH == {C2a,C1a}','MLU == {1, 3}')
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
outAll <- PrInDTAll(data,"real",ctestv,conf.level)
print(outAll) # print model based on all observations
plot(outAll) # plot model based on all observations