acc_margins {dataquieR} | R Documentation |
Estimate marginal means, see emmeans::emmeans
Description
This function examines the impact of so-called process variables on a measurement variable. This implementation combines a descriptive and a model-based approach. Process variables that can be considered in this implementation must be categorical. It is currently not possible to consider more than one process variable within one function call. The measurement variable can be adjusted for (multiple) covariables, such as age or sex, for example.
Marginal means rests on model-based results, i.e. a significantly different marginal mean depends on sample size. Particularly in large studies, small and irrelevant differences may become significant. The contrary holds if sample size is low.
Usage
acc_margins(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
study_data,
label_col,
item_level = "item_level",
threshold_type = "empirical",
threshold_value,
min_obs_in_subgroup = 5,
min_obs_in_cat = 5,
dichotomize_categorical_resp = TRUE,
cut_off_linear_model_for_ord = 10,
meta_data = item_level,
meta_data_v2,
sort_group_var_levels = getOption("dataquieR.acc_margins_sort",
dataquieR.acc_margins_sort_default),
include_numbers_in_figures = getOption("dataquieR.acc_margins_num",
dataquieR.acc_margins_num_default),
n_violin_max = getOption("dataquieR.max_group_var_levels_with_violins",
dataquieR.max_group_var_levels_with_violins_default)
)
Arguments
resp_vars |
variable the name of the measurement variable |
group_vars |
variable list len=1-1. the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
threshold_type |
enum empirical | user | none. In case |
threshold_value |
numeric a multiplier or absolute value (see
|
min_obs_in_subgroup |
integer from=0. This optional argument specifies
the minimum number of observations that is required to
include a subgroup (level) of the |
min_obs_in_cat |
integer This optional argument specifies the minimum
number of observations that is required to include
a category (level) of the outcome ( |
dichotomize_categorical_resp |
logical Should nominal response variables always be transformed to binary variables? |
cut_off_linear_model_for_ord |
integer from=0. This optional argument
specifies the minimum number of observations for
individual levels of an ordinal outcome ( |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
sort_group_var_levels |
logical Should the levels of the grouping variable be sorted descending by the number of observations? Note that ordinal grouping variables will not be reordered. |
include_numbers_in_figures |
logical Should the figure report the number of observations for each level of the grouping variable? |
n_violin_max |
integer from=0. This optional argument specifies
the maximum number of levels of the |
Details
Limitations
Selecting the appropriate distribution is complex. Dozens of continuous,
discrete or mixed distributions are conceivable in the context of
epidemiological data. Their exact exploration is beyond the scope of this
data quality approach. The present function uses the help function
util_dist_selection, the assigned SCALE_LEVEL
and the DATA_TYPE
to discriminate the following cases:
continuous data
binary data
count data with <= 20 distinct values
count data with > 20 distinct values (treated as continuous)
nominal data
ordinal data
Continuous data and count data with more than 20 distinct values are analyzed
by linear models. Count data with up to 20 distinct values are modeled by a
Poisson regression. For binary data, the implementation uses logistic
regression.
Nominal response variables will either be transformed to binary variables or
analyzed by multinomial logistic regression models. The latter option is only
available if the argument dichotomize_categorical_resp
is set to FALSE
and if the package nnet
is installed. The transformation to a binary
variable can be user-specified using the metadata columns RECODE_CASES
and/or RECODE_CONTROL
. Otherwise, the most frequent category will be
assigned to cases and the remaining categories to control.
For ordinal response variables, the argument cut_off_linear_model_for_ord
controls whether the data is analyzed in the same way as continuous data:
If every level of the variable has at least as many observations as specified
in the argument, the data will be analyzed by a linear model. Otherwise,
the data will be modeled by a ordered regression, if the package ordinal
is installed.
Value
a list with:
-
SummaryTable
: data.frame underlying the plot -
ResultData
: data.frame -
SummaryPlot
:ggplot2::ggplot()
margins plot