acc_multivariate_outlier {dataquieR}R Documentation

Calculate and plot Mahalanobis distances

Description

A standard tool to detect multivariate outliers is the Mahalanobis distance. This approach is very helpful for the interpretation of the plausibility of a measurement given the value of another. In this approach the Mahalanobis distance is used as a univariate measure itself. We apply the same rules for the identification of outliers as in univariate outliers:

For further details, please see the vignette for univariate outlier.

Indicator

Usage

acc_multivariate_outlier(
  variable_group = NULL,
  id_vars = NULL,
  label_col = VAR_NAMES,
  study_data,
  item_level = "item_level",
  n_rules = 4,
  max_non_outliers_plot = 10000,
  criteria = c("tukey", "3sd", "hubert", "sigmagap"),
  meta_data = item_level,
  meta_data_v2,
  scale = getOption("dataquieR.acc_multivariate_outlier.scale",
    dataquieR.acc_multivariate_outlier.scale_default),
  multivariate_outlier_check = TRUE
)

Arguments

variable_group

variable list the names of the continuous measurement variables building a group, for that multivariate outliers make sense.

id_vars

variable optional, an ID variable of the study data. If not specified row numbers are used.

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

n_rules

numeric from=1 to=4. the no. of rules that must be violated to classify as outlier

max_non_outliers_plot

integer from=0. Maximum number of non-outlier points to be plot. If more points exist, a subsample will be plotted only. Note, that sampling is not deterministic.

criteria

set tukey | 3SD | hubert | sigmagap. a vector with methods to be used for detecting outliers.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

scale

logical Should min-max-scaling be applied per variable?

multivariate_outlier_check

logical really check, pipeline use, only.

Value

a list with:

ALGORITHM OF THIS IMPLEMENTATION:

List function.

See Also

Online Documentation


[Package dataquieR version 2.5.1 Index]