find_gross {outlierMBC} | R Documentation |
Find gross outliers.
Description
The distance of each observation to its k^{th}
nearest neighbour
is computed. We assume that the largest max_out
kNN distances correspond to
potential outliers. We select the next largest kNN distance, outside of the
top max_out
, as a benchmark value. We multiply this benchmark kNN distance
by multiplier
to get the minimum threshold for our gross outliers. In other
words, a gross outlier must have a kNN distance at least multiplier
times
greater than all of the observations which we do not consider to be potential
outliers.
Usage
find_gross(
x,
max_out,
multiplier = 3,
k_neighbours = floor(nrow(x)/100),
manual_threshold = NULL,
scale = TRUE
)
Arguments
x |
Data. |
max_out |
Maximum number of outliers. |
multiplier |
Multiplicative factor used to get gross outlier threshold. |
k_neighbours |
Number of neighbours for dbscan::kNNdist. |
manual_threshold |
Optional preset threshold. |
scale |
Logical value controlling whether we apply |
Value
find_gross
returns a list with the following elements:
gross_choice
A numeric value indicating the elbow's location.
gross_bool
A logical vector identifying the gross outliers.
gross_curve
ggplot of the highest
2 * max_out
kNN distances in decreasing order.gross_scatter
ggplot of all kNN distances in index order.