check_dag {performance} | R Documentation |
Check correct model adjustment for identifying causal effects
Description
The purpose of check_dag()
is to build, check and visualize
your model based on directed acyclic graphs (DAG). The function checks if a
model is correctly adjusted for identifying specific relationships of
variables, especially directed (maybe also "causal") effects for given
exposures on an outcome. In case of incorrect adjustments, the function
suggests the minimal required variables that should be adjusted for (sometimes
also called "controlled for"), i.e. variables that at least need to be
included in the model. Depending on the goal of the analysis, it is still
possible to add more variables to the model than just the minimally required
adjustment sets.
check_dag()
is a convenient wrapper around ggdag::dagify()
,
dagitty::adjustmentSets()
and dagitty::adjustedNodes()
to check correct
adjustment sets. It returns a dagitty object that can be visualized with
plot()
. as.dag()
is a small convenient function to return the
dagitty-string, which can be used for the online-tool from the
dagitty-website.
Usage
check_dag(
...,
outcome = NULL,
exposure = NULL,
adjusted = NULL,
latent = NULL,
effect = "all",
coords = NULL
)
as.dag(x, ...)
Arguments
... |
One or more formulas, which are converted into dagitty syntax.
First element may also be model object. If a model objects is provided, its
formula is used as first formula, and all independent variables will be used
for the |
outcome |
Name of the dependent variable (outcome), as character string
or as formula. Must be a valid name from the formulas provided in |
exposure |
Name of the exposure variable (as character string or
formula), for which the direct and total causal effect on the |
adjusted |
A character vector or formula with names of variables that
are adjusted for in the model, e.g. |
latent |
A character vector with names of latent variables in the model. |
effect |
Character string, indicating which effect to check. Can be
|
coords |
Coordinates of the variables when plotting the DAG. The coordinates can be provided in three different ways:
See 'Examples'. |
x |
An object of class |
Value
An object of class check_dag
, which can be visualized with plot()
.
The returned object also inherits from class dagitty
and thus can be used
with all functions from the ggdag and dagitty packages.
Specifying the DAG formulas
The formulas have following syntax:
One-directed paths: On the left-hand-side is the name of the variables where causal effects point to (direction of the arrows, in dagitty-language). On the right-hand-side are all variables where causal effects are assumed to come from. For example, the formula
Y ~ X1 + X2
, paths directed from bothX1
andX2
toY
are assumed.Bi-directed paths: Use
~~
to indicate bi-directed paths. For example,Y ~~ X
indicates that the path betweenY
andX
is bi-directed, and the arrow points in both directions. Bi-directed paths often indicate unmeasured cause, or unmeasured confounding, of the two involved variables.
Minimally required adjustments
The function checks if the model is correctly adjusted for identifying the direct and total effects of the exposure on the outcome. If the model is correctly specified, no adjustment is needed to estimate the direct effect. If the model is not correctly specified, the function suggests the minimally required variables that should be adjusted for. The function distinguishes between direct and total effects, and checks if the model is correctly adjusted for both. If the model is cyclic, the function stops and suggests to remove cycles from the model.
Note that it sometimes could be necessary to try out different combinations
of suggested adjustments, because check_dag()
can not always detect whether
at least one of several variables is required, or whether adjustments should
be done for all listed variables. It can be useful to copy the dagitty-code
(using as.dag()
, which prints the dagitty-string into the console) into
the dagitty-website and play around with different adjustments.
Direct and total effects
The direct effect of an exposure on an outcome is the effect that is not mediated by any other variable in the model. The total effect is the sum of the direct and indirect effects. The function checks if the model is correctly adjusted for identifying the direct and total effects of the exposure on the outcome.
Why are DAGs important - the Table 2 fallacy
Correctly thinking about and identifying the relationships between variables is important when it comes to reporting coefficients from regression models that mutually adjust for "confounders" or include covariates. Different coefficients might have different interpretations, depending on their relationship to other variables in the model. Sometimes, a regression coefficient represents the direct effect of an exposure on an outcome, but sometimes it must be interpreted as total effect, due to the involvement of mediating effects. This problem is also called "Table 2 fallacy" (Westreich and Greenland 2013). DAG helps visualizing and thereby focusing the relationships of variables in a regression model to detect missing adjustments or over-adjustment.
References
Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42. doi:10.1177/2515245917745629
Westreich, D., & Greenland, S. (2013). The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients. American Journal of Epidemiology, 177(4), 292–298. doi:10.1093/aje/kws412
Examples
# no adjustment needed
check_dag(
y ~ x + b,
outcome = "y",
exposure = "x"
)
# incorrect adjustment
dag <- check_dag(
y ~ x + b + c,
x ~ b,
outcome = "y",
exposure = "x"
)
dag
plot(dag)
# After adjusting for `b`, the model is correctly specified
dag <- check_dag(
y ~ x + b + c,
x ~ b,
outcome = "y",
exposure = "x",
adjusted = "b"
)
dag
# using formula interface for arguments "outcome", "exposure" and "adjusted"
check_dag(
y ~ x + b + c,
x ~ b,
outcome = ~y,
exposure = ~x,
adjusted = ~ b + c
)
# if not provided, "outcome" is taken from first formula, same for "exposure"
# thus, we can simplify the above expression to
check_dag(
y ~ x + b + c,
x ~ b,
adjusted = ~ b + c
)
# use specific layout for the DAG
dag <- check_dag(
score ~ exp + b + c,
exp ~ b,
outcome = "score",
exposure = "exp",
coords = list(
# x-coordinates for all nodes
x = c(score = 5, exp = 4, b = 3, c = 3),
# y-coordinates for all nodes
y = c(score = 3, exp = 3, b = 2, c = 4)
)
)
plot(dag)
# alternative way of providing the coordinates
dag <- check_dag(
score ~ exp + b + c,
exp ~ b,
outcome = "score",
exposure = "exp",
coords = list(
# x/y coordinates for each node
score = c(5, 3),
exp = c(4, 3),
b = c(3, 2),
c = c(3, 4)
)
)
plot(dag)
# Objects returned by `check_dag()` can be used with "ggdag" or "dagitty"
ggdag::ggdag_status(dag)
# Using a model object to extract information about outcome,
# exposure and adjusted variables
data(mtcars)
m <- lm(mpg ~ wt + gear + disp + cyl, data = mtcars)
dag <- check_dag(
m,
wt ~ disp + cyl,
wt ~ am
)
dag
plot(dag)