measure_stability {optRF} | R Documentation |
Measure the stability of random forest
Description
Measure the stability of random forest for a certain data set with a certain number of trees
Usage
measure_stability(
y,
X,
num.trees = 500,
method = c("prediction", "importance"),
X_Test = NULL,
alpha = NULL,
select_for = c("high", "low", "zero"),
importance = c("permutation", "impurity", "impurity_corrected"),
number_repetitions = 10,
verbose = TRUE,
...
)
Arguments
y |
A vector containing the response variable in the training data set. |
X |
A data frame containing the explanatory variables in the training data set. The number of rows must be equal to the number of elements in y. |
num.trees |
Either a single value or a vector containing the numbers of trees for which the stability should be analysed (default = 500). |
method |
Either "prediction" (default) or "importance" specifying if random forest should be used for prediction or to estimate the variable importance. |
X_Test |
If method is "prediction", a data frame containing the explanatory variables of the test data set. If not entered, the out of bag data will be used. |
alpha |
If method is "prediction", the number of best individuals to be selected in the test data set (default = 0.15), if method is "importance", the number of most important variables to be selected (default = 0.05). |
select_for |
If method is "prediction", what should be selected? In random forest classification, this must be set to a vector containing the values of the desired classes. In random forest regression, this can be set as "high" (default) to select the individuals with the highest predicted value, "low" to select the individuals with the lowest predicted value, or "zero" to select the individuals which predicted value is closest to zero. |
importance |
If method is "importance", the variable importance mode, one of "permutation" (default), "impurity" or "impurity_corrected". |
number_repetitions |
Number of repetitions of random forest to estimate the stability. It needs to be at least 2. Default is 10. |
verbose |
Show computation status. |
... |
Any other argument from the ranger function. |
Value
A data frame summarising the estimated stability for the given num.trees values.
Examples
## Not run:
data(SNPdata)
set.seed(123)
stability_result = measure_stability(y = SNPdata[,1], X=SNPdata[,-1], num.trees=500)
stability_result # Stability of random forest with 500 trees
## End(Not run)