acc_margins {dataquieR}R Documentation

Estimate marginal means, see emmeans::emmeans

Description

This function examines the impact of so-called process variables on a measurement variable. This implementation combines a descriptive and a model-based approach. Process variables that can be considered in this implementation must be categorical. It is currently not possible to consider more than one process variable within one function call. The measurement variable can be adjusted for (multiple) covariables, such as age or sex, for example.

Marginal means rests on model-based results, i.e. a significantly different marginal mean depends on sample size. Particularly in large studies, small and irrelevant differences may become significant. The contrary holds if sample size is low.

Indicator

Usage

acc_margins(
  resp_vars = NULL,
  group_vars = NULL,
  co_vars = NULL,
  study_data,
  label_col,
  item_level = "item_level",
  threshold_type = "empirical",
  threshold_value,
  min_obs_in_subgroup = 5,
  min_obs_in_cat = 5,
  dichotomize_categorical_resp = TRUE,
  cut_off_linear_model_for_ord = 10,
  meta_data = item_level,
  meta_data_v2,
  sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
    dataquieR.acc_margins_sort_default),
  include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
    dataquieR.acc_margins_num_default),
  n_violin_max = getOption("dataquieR.max_group_var_levels_with_violins",
    dataquieR.max_group_var_levels_with_violins_default)
)

Arguments

resp_vars

variable the name of the measurement variable

group_vars

variable list len=1-1. the name of the observer, device or reader variable

co_vars

variable list a vector of covariables, e.g. age and sex for adjustment

study_data

data.frame the data frame that contains the measurements

label_col

variable attribute the name of the column in the metadata with labels of variables

item_level

data.frame the data frame that contains metadata attributes of study data

threshold_type

enum empirical | user | none. In case empirical is chosen, a multiplier of the scale measure is used. In case of user, a value of the mean or probability (binary data) has to be defined see ⁠Implementation and use of thresholds⁠ in the online documentation). In case of none, no thresholds are displayed and no flagging of unusual group levels is applied.

threshold_value

numeric a multiplier or absolute value (see ⁠Implementation and use of thresholds⁠ in the online documentation).

min_obs_in_subgroup

integer from=0. This optional argument specifies the minimum number of observations that is required to include a subgroup (level) of the group_var in the analysis. Subgroups with less observations are excluded.

min_obs_in_cat

integer This optional argument specifies the minimum number of observations that is required to include a category (level) of the outcome (resp_vars) in the analysis. Categories with less observations are combined into one group. If the collapsed category contains less observations than required, it will be excluded from the analysis.

dichotomize_categorical_resp

logical Should nominal response variables always be transformed to binary variables?

cut_off_linear_model_for_ord

integer from=0. This optional argument specifies the minimum number of observations for individual levels of an ordinal outcome (resp_var) that is required to run a linear model instead of an ordered regression (i.e., a cut-off value above which linear models are considered a good approximation). The argument can be set to NULL if ordered regression models are preferred for ordinal data in any case.

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

sort_group_var_levels

logical Should the levels of the grouping variable be sorted descending by the number of observations? Note that ordinal grouping variables will not be reordered.

include_numbers_in_figures

logical Should the figure report the number of observations for each level of the grouping variable?

n_violin_max

integer from=0. This optional argument specifies the maximum number of levels of the group_var for which violin plots will be shown in the figure.

Details

Limitations

Selecting the appropriate distribution is complex. Dozens of continuous, discrete or mixed distributions are conceivable in the context of epidemiological data. Their exact exploration is beyond the scope of this data quality approach. The present function uses the help function util_dist_selection, the assigned SCALE_LEVEL and the DATA_TYPE to discriminate the following cases:

Continuous data and count data with more than 20 distinct values are analyzed by linear models. Count data with up to 20 distinct values are modeled by a Poisson regression. For binary data, the implementation uses logistic regression. Nominal response variables will either be transformed to binary variables or analyzed by multinomial logistic regression models. The latter option is only available if the argument dichotomize_categorical_resp is set to FALSE and if the package nnet is installed. The transformation to a binary variable can be user-specified using the metadata columns RECODE_CASES and/or RECODE_CONTROL. Otherwise, the most frequent category will be assigned to cases and the remaining categories to control. For ordinal response variables, the argument cut_off_linear_model_for_ord controls whether the data is analyzed in the same way as continuous data: If every level of the variable has at least as many observations as specified in the argument, the data will be analyzed by a linear model. Otherwise, the data will be modeled by a ordered regression, if the package ordinal is installed.

Value

a list with:

See Also

Online Documentation


[Package dataquieR version 2.5.1 Index]