estimate_contrasts {modelbased} | R Documentation |
Estimate Marginal Contrasts
Description
Run a contrast analysis by estimating the differences between each level of a
factor. See also other related functions such as estimate_means()
and estimate_slopes()
.
Usage
estimate_contrasts(model, ...)
## Default S3 method:
estimate_contrasts(
model,
contrast = NULL,
by = NULL,
predict = NULL,
ci = 0.95,
comparison = "pairwise",
estimate = NULL,
p_adjust = "none",
transform = NULL,
keep_iterations = FALSE,
effectsize = NULL,
iterations = 200,
es_type = "cohens.d",
backend = NULL,
verbose = TRUE,
...
)
Arguments
model |
A statistical model. |
... |
Other arguments passed, for instance, to
|
contrast |
A character vector indicating the name of the variable(s) for
which to compute the contrasts, optionally including representative values or
levels at which contrasts are evaluated (e.g., |
by |
The (focal) predictor variable(s) at which to evaluate the desired
effect / mean / contrasts. Other predictors of the model that are not
included here will be collapsed and "averaged" over (the effect will be
estimated across them). |
predict |
Is passed to the
See also section Predictions on different scales. |
ci |
Confidence Interval (CI) level. Default to |
comparison |
Specify the type of contrasts or tests that should be carried out.
|
estimate |
The
You can set a default option for the |
p_adjust |
The p-values adjustment method for frequentist multiple
comparisons. Can be one of |
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
keep_iterations |
If |
effectsize |
Desired measure of standardized effect size, one of
|
iterations |
The number of bootstrap resamples to perform. |
es_type |
Specifies the type of effect-size measure to estimate when
using |
backend |
Whether to use Another difference is that You can set a default backend via |
verbose |
Use |
Details
The estimate_slopes()
, estimate_means()
and estimate_contrasts()
functions are forming a group, as they are all based on marginal
estimations (estimations based on a model). All three are built on the
emmeans or marginaleffects package (depending on the backend
argument), so reading its documentation (for instance emmeans::emmeans()
,
emmeans::emtrends()
or this website) is
recommended to understand the idea behind these types of procedures.
Model-based predictions is the basis for all that follows. Indeed, the first thing to understand is how models can be used to make predictions (see
estimate_link()
). This corresponds to the predicted response (or "outcome variable") given specific predictor values of the predictors (i.e., given a specific data configuration). This is why the concept ofreference grid()
is so important for direct predictions.-
Marginal "means", obtained via
estimate_means()
, are an extension of such predictions, allowing to "average" (collapse) some of the predictors, to obtain the average response value at a specific predictors configuration. This is typically used when some of the predictors of interest are factors. Indeed, the parameters of the model will usually give you the intercept value and then the "effect" of each factor level (how different it is from the intercept). Marginal means can be used to directly give you the mean value of the response variable at all the levels of a factor. Moreover, it can also be used to control, or average over predictors, which is useful in the case of multiple predictors with or without interactions. -
Marginal contrasts, obtained via
estimate_contrasts()
, are themselves at extension of marginal means, in that they allow to investigate the difference (i.e., the contrast) between the marginal means. This is, again, often used to get all pairwise differences between all levels of a factor. It works also for continuous predictors, for instance one could also be interested in whether the difference at two extremes of a continuous predictor is significant. Finally, marginal effects, obtained via
estimate_slopes()
, are different in that their focus is not values on the response variable, but the model's parameters. The idea is to assess the effect of a predictor at a specific configuration of the other predictors. This is relevant in the case of interactions or non-linear relationships, when the effect of a predictor variable changes depending on the other predictors. Moreover, these effects can also be "averaged" over other predictors, to get for instance the "general trend" of a predictor over different factor levels.
Example: Let's imagine the following model lm(y ~ condition * x)
where
condition
is a factor with 3 levels A, B and C and x
a continuous
variable (like age for example). One idea is to see how this model performs,
and compare the actual response y to the one predicted by the model (using
estimate_expectation()
). Another idea is evaluate the average mean at each of
the condition's levels (using estimate_means()
), which can be useful to
visualize them. Another possibility is to evaluate the difference between
these levels (using estimate_contrasts()
). Finally, one could also estimate
the effect of x averaged over all conditions, or instead within each
condition (using estimate_slopes()
).
Value
A data frame of estimated contrasts.
Comparison options
-
comparison = "pairwise"
: This method computes all possible unique differences between pairs of levels of the focal predictor. For example, if a factor has levels A, B, and C, it would compute A-B, A-C, and B-C. -
comparison = "reference"
: This compares each level of the focal predictor to a specified reference level (by default, the first level). For example, if levels are A, B, C, and A is the reference, it computes B-A and C-A. -
comparison = "sequential"
: This compares each level to the one immediately following it in the factor's order. For levels A, B, C, it would compute B-A and C-B. -
comparison = "meandev"
: This contrasts each level's estimate against the grand mean of all estimates for the focal predictor. -
comparison = "meanotherdev"
: Similar tomeandev
, but each level's estimate is compared against the mean of all other levels, excluding itself. -
comparison = "poly"
: These are used for ordered categorical variables to test for linear, quadratic, cubic, etc., trends across the levels. They assume equal spacing between levels. -
comparison = "helmert"
: Contrast 2nd level to the first, 3rd to the average of the first two, and so on. Each level (except the first) is compared to the mean of the preceding levels. For levels A, B, C, it would compute B-A and C-(A+B)/2. -
comparison = "trt_vs_ctrl"
: This compares all levels (excluding the first, which is typically the control) against the first level. It's often used when comparing multiple treatment groups to a single control group. To test multiple hypotheses jointly (usually used for factorial designs),
comparison
can also be"joint"
. In this case, use thetest
argument to specify which test should be conducted:"F"
(default) or"Chi2"
.-
comparison = "inequality"
computes the marginal effect inequality summary of categorical predictors' overall effects, respectively, the comprehensive effect of an independent variable across all outcome categories of a nominal or ordinal dependent variable (total marginal effect, see Mize and Han, 2025). The marginal effect inequality focuses on the heterogeneity of the effects of a categorical independent variable. It helps understand how the effect of the variable differs across its categories or levels. When the dependent variable is categorical (e.g., logistic, ordinal or multinomial regression), marginal effect inequality provides a holistic view of how an independent variable affects a nominal or ordinal dependent variable. It summarizes the overall impact (total marginal effects) across all possible outcome categories. -
comparison = "inequality_pairwise"
computes the difference (pairwise comparisons) between marginal effects inequality measures. Depending on the sign, this measure indicates which of the predictors has a stronger impact on the dependent variable in terms of inequalities.
Effect Size
By default, estimate_contrasts()
reports no standardized effect size on
purpose. Should one request one, some things are to keep in mind. As the
authors of emmeans write, "There is substantial disagreement among
practitioners on what is the appropriate sigma to use in computing effect
sizes; or, indeed, whether any effect-size measure is appropriate for some
situations. The user is completely responsible for specifying appropriate
parameters (or for failing to do so)."
In particular, effect size method "boot"
does not correct for covariates
in the model, so should probably only be used when there is just one
categorical predictor (with however many levels). Some believe that if there
are multiple predictors or any covariates, it is important to re-compute
sigma adding back in the response variance associated with the variables that
aren't part of the contrast.
effectsize = "emmeans"
uses emmeans::eff_size with
sigma = stats::sigma(model)
, edf = stats::df.residual(model)
and
method = "identity"
. This standardizes using the MSE (sigma). Some believe
this works when the contrasts are the only predictors in the model, but not
when there are covariates. The response variance accounted for by the
covariates should not be removed from the SD used to standardize. Otherwise,
d will be overestimated.
effectsize = "marginal"
uses the following formula to compute effect
size: d_adj <- difference * (1- R2)/ sigma
. This standardizes
using the response SD with only the between-groups variance on the focal
factor/contrast removed. This allows for groups to be equated on their
covariates, but creates an appropriate scale for standardizing the response.
effectsize = "boot"
uses bootstrapping (defaults to a low value of
200) through bootES::bootES. Adjusts for contrasts, but not for covariates.
Predictions and contrasts at meaningful values (data grids)
To define representative values for focal predictors (specified in by
,
contrast
, and trend
), you can use several methods. These values are
internally generated by insight::get_datagrid()
, so consult its
documentation for more details.
You can directly specify values as strings or lists for
by
,contrast
, andtrend
.For numeric focal predictors, use examples like
by = "gear = c(4, 8)"
,by = list(gear = c(4, 8))
orby = "gear = 5:10"
For factor or character predictors, use
by = "Species = c('setosa', 'virginica')"
orby = list(Species = c('setosa', 'virginica'))
You can use "shortcuts" within square brackets, such as
by = "Sepal.Width = [sd]"
orby = "Sepal.Width = [fivenum]"
For numeric focal predictors, if no representative values are specified,
length
andrange
control the number and type of representative values:-
length
determines how many equally spaced values are generated. -
range
specifies the type of values, like"range"
or"sd"
. -
length
andrange
apply to all numeric focal predictors. If you have multiple numeric predictors,
length
andrange
can accept multiple elements, one for each predictor.
-
For integer variables, only values that appear in the data will be included in the data grid, independent from the
length
argument. This behaviour can be changed by settingprotect_integers = FALSE
, which will then treat integer variables as numerics (and possibly produce fractions).
See also this vignette for some examples.
Predictions on different scales
The predict
argument allows to generate predictions on different scales of
the response variable. The "link"
option does not apply to all models, and
usually not to Gaussian models. "link"
will leave the values on scale of
the linear predictors. "response"
(or NULL
) will transform them on scale
of the response variable. Thus for a logistic model, "link"
will give
estimations expressed in log-odds (probabilities on logit scale) and
"response"
in terms of probabilities.
To predict distributional parameters (called "dpar" in other packages), for
instance when using complex formulae in brms
models, the predict
argument
can take the value of the parameter you want to estimate, for instance
"sigma"
, "kappa"
, etc.
"response"
and "inverse_link"
both return predictions on the response
scale, however, "response"
first calculates predictions on the response
scale for each observation and then aggregates them by groups or levels
defined in by
. "inverse_link"
first calculates predictions on the link
scale for each observation, then aggregates them by groups or levels defined
in by
, and finally back-transforms the predictions to the response scale.
Both approaches have advantages and disadvantages. "response"
usually
produces less biased predictions, but confidence intervals might be outside
reasonable bounds (i.e., for instance can be negative for count data). The
"inverse_link"
approach is more robust in terms of confidence intervals,
but might produce biased predictions. However, you can try to set
bias_correction = TRUE
, to adjust for this bias.
In particular for mixed models, using "response"
is recommended, because
averaging across random effects groups is then more accurate.
References
Mize, T., & Han, B. (2025). Inequality and Total Effect Summary Measures for Nominal and Ordinal Variables. Sociological Science, 12, 115–157. doi:10.15195/v12.a7
Montiel Olea, J. L., and Plagborg-Møller, M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1–17. doi:10.1002/jae.2656
Examples
## Not run:
# Basic usage
model <- lm(Sepal.Width ~ Species, data = iris)
estimate_contrasts(model)
# Dealing with interactions
model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris)
# By default: selects first factor
estimate_contrasts(model)
# Can also run contrasts between points of numeric, stratified by "Species"
estimate_contrasts(model, contrast = "Petal.Width", by = "Species")
# Or both
estimate_contrasts(model, contrast = c("Species", "Petal.Width"), length = 2)
# Or with custom specifications
estimate_contrasts(model, contrast = c("Species", "Petal.Width = c(1, 2)"))
# Or modulate it
estimate_contrasts(model, by = "Petal.Width", length = 4)
# Standardized differences
estimated <- estimate_contrasts(lm(Sepal.Width ~ Species, data = iris))
standardize(estimated)
# custom factor contrasts - contrasts the average effects of two levels
# against the remaining third level
data(puppy_love, package = "modelbased")
cond_tx <- cbind("no treatment" = c(1, 0, 0), "treatment" = c(0, 0.5, 0.5))
model <- lm(happiness ~ puppy_love * dose, data = puppy_love)
estimate_slopes(model, "puppy_love", by = "dose", comparison = cond_tx)
# Other models (mixed, Bayesian, ...)
data <- iris
data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B")
model <- lme4::lmer(Sepal.Width ~ Species + (1 | Petal.Length_factor), data = data)
estimate_contrasts(model)
data <- mtcars
data$cyl <- as.factor(data$cyl)
data$am <- as.factor(data$am)
model <- rstanarm::stan_glm(mpg ~ cyl * wt, data = data, refresh = 0)
estimate_contrasts(model)
estimate_contrasts(model, by = "wt", length = 4)
model <- rstanarm::stan_glm(
Sepal.Width ~ Species + Petal.Width + Petal.Length,
data = iris,
refresh = 0
)
estimate_contrasts(model, by = "Petal.Length = [sd]", test = "bf")
## End(Not run)