find_gross {outlierMBC}R Documentation

Find gross outliers.

Description

The distance of each observation to its k^{th} nearest neighbour is computed. We assume that the largest max_out kNN distances correspond to potential outliers. We select the next largest kNN distance, outside of the top max_out, as a benchmark value. We multiply this benchmark kNN distance by multiplier to get the minimum threshold for our gross outliers. In other words, a gross outlier must have a kNN distance at least multiplier times greater than all of the observations which we do not consider to be potential outliers.

Usage

find_gross(
  x,
  max_out,
  multiplier = 3,
  k_neighbours = floor(nrow(x)/100),
  manual_threshold = NULL,
  scale = TRUE
)

Arguments

x

Data.

max_out

Maximum number of outliers.

multiplier

Multiplicative factor used to get gross outlier threshold.

k_neighbours

Number of neighbours for dbscan::kNNdist.

manual_threshold

Optional preset threshold.

scale

Logical value controlling whether we apply scale to x.

Value

find_gross returns a list with the following elements:

gross_choice

A numeric value indicating the elbow's location.

gross_bool

A logical vector identifying the gross outliers.

gross_curve

ggplot of the highest 2 * max_out kNN distances in decreasing order.

gross_scatter

ggplot of all kNN distances in index order.


[Package outlierMBC version 0.0.1 Index]