marginal_comparison.bgmfit {bsitar}R Documentation

Estimate and compare growth curves

Description

The marginal_comparison() function estimates and compares growth curves such as distance and velocity. This function is a wrapper around marginaleffects::comparisons() and marginaleffects::avg_comparisons(). The marginaleffects::comparisons() function computes unit-level (conditional) estimates, whereas marginaleffects::avg_comparisons() returns average (marginal) estimates. A detailed explanation is available here. Note that the marginaleffects package is highly flexible, and users are expected to have a strong understanding of its workings. Additionally, since the marginaleffects package is evolving rapidly, results from the current implementation should be considered experimental.

Usage

## S3 method for class 'bgmfit'
marginal_comparison(
  model,
  resp = NULL,
  dpar = NULL,
  ndraws = NULL,
  draw_ids = NULL,
  newdata = NULL,
  datagrid = NULL,
  re_formula = NA,
  allow_new_levels = FALSE,
  sample_new_levels = "gaussian",
  xrange = 1,
  digits = 2,
  numeric_cov_at = NULL,
  aux_variables = NULL,
  levels_id = NULL,
  avg_reffects = NULL,
  idata_method = NULL,
  ipts = NULL,
  seed = 123,
  future = FALSE,
  future_session = "multisession",
  future_splits = NULL,
  future_method = "future",
  future_re_expose = NULL,
  usedtplyr = FALSE,
  usecollapse = TRUE,
  cores = NULL,
  average = FALSE,
  plot = FALSE,
  showlegends = NULL,
  variables = NULL,
  deriv = NULL,
  deriv_model = NULL,
  method = "pkg",
  marginals = NULL,
  pdrawso = FALSE,
  pdrawsp = FALSE,
  pdrawsh = FALSE,
  comparison = "difference",
  type = NULL,
  by = FALSE,
  conf_level = 0.95,
  transform = NULL,
  cross = FALSE,
  wts = NULL,
  hypothesis = NULL,
  equivalence = NULL,
  eps = NULL,
  constrats_by = NULL,
  constrats_at = NULL,
  reformat = NULL,
  estimate_center = NULL,
  estimate_interval = NULL,
  dummy_to_factor = NULL,
  verbose = FALSE,
  expose_function = FALSE,
  usesavedfuns = NULL,
  clearenvfuns = NULL,
  funlist = NULL,
  envir = NULL,
  ...
)

marginal_comparison(model, ...)

Arguments

model

An object of class bgmfit.

resp

A character string (default NULL) to specify the response variable when processing posterior draws for univariate_by and multivariate models. See bsitar() for details on univariate_by and multivariate models.

dpar

Optional name of a predicted distributional parameter. If specified, expected predictions of this parameters are returned.

ndraws

A positive integer indicating the number of posterior draws to use in estimation. If NULL (default), all draws are used.

draw_ids

An integer specifying the specific posterior draw(s) to use in estimation (default NULL).

newdata

An optional data frame for estimation. If NULL (default), newdata is retrieved from the model.

datagrid

A data frame or named list for setting up a custom grid of predictor values to evaluate the quantities of interest. If NULL (default), no custom grid is used. The grid can be constructed using marginaleffects::datagrid(). If datagrid = list(), essential arguments such as model and newdata are inferred automatically from the respective arguments in the model fit.

re_formula

Option to indicate whether or not to include individual/group-level effects in the estimation. When NA (default), individual-level effects are excluded, and population average growth parameters are computed. When NULL, individual-level effects are included in the computation, and the resulting growth parameters are individual-specific. In both cases (NA or NULL), continuous and factor covariates are appropriately included in the estimation. Continuous covariates are set to their means by default (see numeric_cov_at for details), while factor covariates remain unaltered, allowing for the estimation of covariate-specific population average and individual-specific growth parameters.

allow_new_levels

A flag indicating if new levels of group-level effects are allowed (defaults to FALSE). Only relevant if newdata is provided.

sample_new_levels

Indicates how to sample new levels for grouping factors specified in re_formula. This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE. If "uncertainty" (default), each posterior sample for a new level is drawn from the posterior draws of a randomly chosen existing level. Each posterior sample for a new level may be drawn from a different existing level such that the resulting set of new posterior draws represents the variation across existing levels. If "gaussian", sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis or predicting new levels in situations where relatively few levels where observed in the old_data. If "old_levels", directly sample new levels from the existing levels, where a new level is assigned all of the posterior draws of the same (randomly chosen) existing level.

xrange

An integer to set the predictor range (e.g., age) when executing the interpolation via ipts. By default, NULL sets the individual-specific predictor range. Setting xrange = 1 applies the same range for individuals within the same higher grouping variable (e.g., study). Setting xrange = 2 applies an identical range across the entire sample. Alternatively, a numeric vector (e.g., xrange = c(6, 20)) can be provided to set the range within the specified values.

digits

An integer (default 2) for rounding the estimates to the specified number of decimal places using base::round().

numeric_cov_at

An optional (named list) argument to specify the value of continuous covariate(s). The default NULL option sets the continuous covariate(s) to their mean. Alternatively, a named list can be supplied to manually set these values. For example, numeric_cov_at = list(xx = 2) will set the continuous covariate variable 'xx' to 2. The argument numeric_cov_at is ignored when no continuous covariates are included in the model.

aux_variables

An optional argument to specify the variable(s) that can be passed to the ipts argument (see below). This is useful when fitting location-scale models and measurement error models. If post-processing functions throw an error such as variable 'x' not found in either 'data' or 'data2', consider using aux_variables.

levels_id

An optional argument to specify the ids for the hierarchical model (default NULL). It is used only when the model is applied to data with three or more levels of hierarchy. For a two-level model, levels_id is automatically inferred from the model fit. For models with three or more levels, levels_id is inferred from the model fit under the assumption that hierarchy is specified from the lowest to the uppermost level, i.e., id followed by study, where id is nested within study. However, it is not guaranteed that levels_id is sorted correctly, so it is better to set it manually when fitting a model with three or more levels of hierarchy.

avg_reffects

An optional argument (default NULL) to calculate (marginal/average) curves and growth parameters, such as APGV and PGV. If specified, it must be a named list indicating the over (typically a level 1 predictor, such as age), feby (fixed effects, typically a factor variable), and reby (typically NULL, indicating that parameters are integrated over the random effects). For example, avg_reffects = list(feby = 'study', reby = NULL, over = 'age').

idata_method

A character string to indicate the interpolation method. The number of interpolation points is set by the ipts argument. Available options for idata_method are method 1 (specified as 'm1') and method 2 (specified as 'm2').

  • Method 1 ('m1') is adapted from the iapvbs package and is documented here.

  • Method 2 ('m2') is based on the JMbayes package and is documented here. The 'm1' method works by internally constructing the data frame based on the model configuration, while the 'm2' method uses the exact data frame from the model fit, accessible via fit$data. If idata_method = NULL (default), method 'm2' is automatically selected. Note that method 'm1' may fail in certain cases, especially when the model includes covariates (particularly in univariate_by models). In such cases, it is recommended to use method 'm2'.

ipts

An integer to set the length of the predictor variable for generating a smooth velocity curve. If NULL, the original values are returned. If an integer (e.g., ipts = 10, default), the predictor is interpolated. Note that these interpolations do not alter the range of the predictor when calculating population averages and/or individual-specific growth curves.

seed

An integer (default 123) that is passed to the estimation method to ensure reproducibility.

future

A logical value (default FALSE) to specify whether or not to perform parallel computations. If set to TRUE, the future.apply::future_sapply() function is used to summarize the posterior draws in parallel.

future_session

A character string specifying the session type when future = TRUE. The 'multisession' (default) option sets the multisession environment, while the 'multicore' option sets up a multicore session. Note that 'multicore' is not supported on Windows systems. For more details, see future.apply::future_sapply().

future_splits

A list (default NULL) that can be an unnamed numeric list, a logical value, or a numeric vector of length 1 or 2. It is used to split the processing of posterior draws into smaller subsets for parallel computation.

  • If passed as a list (e.g., future_splits = list(1:6, 7:10)), each sequence of numbers is passed to the draw_ids argument.

  • If passed as a numeric vector (e.g., future_splits = c(10, 2)), the first element specifies the number of draws (see draw_ids) and the second element indicates the number of splits. The splits are created using parallel::splitIndices().

  • If passed as a numeric vector of length 1, the first element is internally set as the number of draws (ndraws or draw_ids) depending on which one is not NULL.

  • If TRUE, a numeric vector for future_splits is created based on the number of draws (ndraws) and the number of cores (cores).

  • If FALSE, future_splits is ignored. The use case for future_splits is to save memory and improve performance, especially on Linux systems when future::plan() is set to multicore. Note: on Windows systems, R processes may not be freed automatically when using 'multisession'. In such cases, the R processes can be interrupted using installr::kill_all_Rscript_s().

future_method

A character string (default 'future') to specify the method for parallel computation. Options include:

future_re_expose

A logical (default NULL) to indicate whether to re-expose Stan functions when future = TRUE. This is especially relevant when future::plan() is set to 'multisession', as already exposed C++ Stan functions cannot be passed across multiple sessions.

  • When future_re_expose = NULL (the default), future_re_expose is automatically set to TRUE for the 'multisession' plan.

  • It is advised to explicitly set future_re_expose = TRUE for speed gains when using parallel processing with future = TRUE.

usedtplyr

A logical (default FALSE) indicating whether to use the dtplyr package for summarizing the draws. This package uses data.table as a back-end. It is useful when the data has a large number of observations. For typical use cases, it does not make a significant performance difference. The usedtplyr argument is evaluated only when method = 'custom'.

usecollapse

A logical (default FALSE) to indicate whether to use the collapse package for summarizing the draws.

cores

The number of cores to be used for parallel computations if future = TRUE. On non-Windows systems, this argument can be set globally via the mc.cores option. By default, NULL, the number of cores is automatically determined using future::availableCores(), and it will use the maximum number of cores available minus one (i.e., future::availableCores() - 1).

average

A logical indicating whether to call marginaleffects::comparisons() (if FALSE) or marginaleffects::avg_comparisons() (if TRUE). Default is FALSE.

plot

A logical indicating whether to plot the comparisons using marginaleffects::plot_comparisons(). Default is FALSE.

showlegends

A logical value to specify whether to show legends (TRUE) or not (FALSE). If NULL (default), the value of showlegends is internally set to TRUE if re_formula = NA, and FALSE if re_formula = NULL.

variables

A named list specifying the level 1 predictor, such as age or time, used for estimating growth parameters in the current use case. The variables list is set via the esp argument (default value is 1e-6). If variables is NULL, the relevant information is retrieved internally from the model. Alternatively, users can define variables as a named list, e.g., variables = list('x' = 1e-6) where 'x' is the level 1 predictor. By default, variables = list('age' = 1e-6) in the marginaleffects package, as velocity is usually computed by differentiating the distance curve using the dydx approach. When using this default, the argument deriv is automatically set to 0 and deriv_model to FALSE. If parameters are to be estimated based on the model's first derivative, deriv must be set to 1 and variables will be defined as variables = list('age' = 0). Note that if the default behavior is used (deriv = 0 and variables = list('x' = 1e-6)), additional arguments cannot be passed to variables. In contrast, when using an alternative approach (deriv = 0 and variables = list('x' = 0)), additional options can be passed to the marginaleffects::comparisons() and marginaleffects::avg_comparisons() functions.

deriv

A numeric value indicating whether to estimate parameters based on the differentiation of the distance curve or the model's first derivative.

deriv_model

A logical value specifying whether to estimate the velocity curve from the derivative function or by differentiating the distance curve. Set deriv_model = TRUE for functions that require the velocity curve, such as growthparameters() and plot_curves(). Set it to NULL for functions that use the distance curve (i.e., fitted values), such as loo_validation() and plot_ppc().

method

A character string specifying whether to compute comparisons using the marginaleffects machinery (method = 'pkg') or via custom functions for efficiency (method = 'custom'). Default is 'pkg'. The 'custom' method is useful when testing hypotheses and should be used cautiously.

marginals

A list, data.frame, or tibble returned by the marginaleffects functions (default NULL). This is only evaluated when method = 'custom'. The marginals can be the output from marginaleffects functions or posterior draws from marginaleffects::posterior_draws(). The marginals argument is primarily used for internal purposes.

pdrawso

A character string (default FALSE) to indicate whether to return the original posterior draws for parameters. Options include:

  • 'return': returns the original posterior draws,

  • 'add': adds the original posterior draws to the outcome.

When pdrawso = TRUE, the default behavior is pdrawso = 'return'. Note that the posterior draws are returned before calling marginaleffects::posterior_draws().

pdrawsp

A character string (default FALSE) to indicate whether to return the posterior draws for parameters. Options include:

  • 'return': returns the posterior draws for parameters,

  • 'add': adds the posterior draws to the outcome.

When pdrawsp = TRUE, the default behavior is pdrawsp = 'return'. The pdrawsp represent the parameter estimates for each of the posterior samples, and the summary of these are the estimates returned.

pdrawsh

A character string (default FALSE) to indicate whether to return the posterior draws for parameter contrasts. Options include:

  • 'return': returns the posterior draws for contrasts.

The summary of posterior draws for parameters is the default returned object. The pdrawsh represent the contrast estimates for each of the posterior samples, and the summary of these are the contrast returned.

comparison

A character string specifying the comparison type for growth parameter estimation. Options are 'difference' and 'differenceavg'. This argument sets up the internal function for estimating parameters using sitar::getPeak(), sitar::getTakeoff(), and sitar::getTrough() functions. These options are restructured according to the user-specified hypothesis argument.

type

string indicates the type (scale) of the predictions used to compute contrasts or slopes. This can differ based on the model type, but will typically be a string such as: "response", "link", "probs", or "zero". When an unsupported string is entered, the model-specific list of acceptable values is returned in an error message. When type is NULL, the first entry in the error message is used by default.

by

Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:

  • FALSE: return the original unit-level estimates.

  • TRUE: aggregate estimates for each term.

  • Character vector of column names in newdata or in the data frame produced by calling the function without the by argument.

  • Data frame with a by column of group labels, and merging columns shared by newdata or the data frame produced by calling the same function without the by argument.

  • See examples below.

  • For more complex aggregations, you can use the FUN argument of the hypotheses() function. See that function's documentation and the Hypothesis Test vignettes on the marginaleffects website.

conf_level

numeric value between 0 and 1. Confidence level to use to build a confidence interval.

transform

A function applied to individual draws from the posterior distribution before computing summaries. The argument transform is based on the marginaleffects::predictions() function. This should not be confused with transform from brms::posterior_predict(), which is now deprecated.

cross
  • FALSE: Contrasts represent the change in adjusted predictions when one predictor changes and all other variables are held constant.

  • TRUE: Contrasts represent the changes in adjusted predictions when all the predictors specified in the variables argument are manipulated simultaneously (a "cross-contrast").

wts

logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in ⁠avg_*()⁠ or with the by argument, and not unit-level estimates. See ?weighted.mean

  • string: column name of the weights variable in newdata. When supplying a column name to wts, it is recommended to supply the original data (including the weights variable) explicitly to newdata.

  • numeric: vector of length equal to the number of rows in the original data or in newdata (if supplied).

  • FALSE: Equal weights.

  • TRUE: Extract weights from the fitted object with insight::find_weights() and use them when taking weighted averages of estimates. Warning: newdata=datagrid() returns a single average weight, which is equivalent to using wts=FALSE

hypothesis

specify a hypothesis test or custom contrast using a number , formula, string equation, vector, matrix, or function.

  • Number: The null hypothesis used in the computation of Z and p (before applying transform).

  • String: Equation to specify linear or non-linear hypothesis tests. If the terms in coef(object) uniquely identify estimates, they can be used in the formula. Otherwise, use b1, b2, etc. to identify the position of each parameter. The ⁠b*⁠ wildcard can be used to test hypotheses on all estimates. If a named vector is used, the names are used as labels in the output. Examples:

    • hp = drat

    • hp + drat = 12

    • b1 + b2 + b3 = 0

    • ⁠b* / b1 = 1⁠

  • Formula: lhs ~ rhs | group

    • lhs

      • ratio

      • difference

      • Leave empty for default value

    • rhs

      • pairwise and revpairwise: pairwise differences between estimates in each row.

      • reference: differences between the estimates in each row and the estimate in the first row.

      • sequential: difference between an estimate and the estimate in the next row.

      • meandev: difference between an estimate and the mean of all estimates.

      • 'meanotherdev: difference between an estimate and the mean of all other estimates, excluding the current one.

      • poly: polynomial contrasts, as computed by the stats::contr.poly() function.

      • helmert: Helmert contrasts, as computed by the stats::contr.helmert() function. Contrast 2nd level to the first, 3rd to the average of the first two, and so on.

      • trt_vs_ctrl: difference between the mean of estimates (except the first) and the first estimate.

      • I(fun(x)): custom function to manipulate the vector of estimates x. The function fun() can return multiple (potentially named) estimates.

    • group (optional)

      • Column name of newdata. Conduct hypothesis tests withing subsets of the data.

    • Examples:

      • ~ poly

      • ~ sequential | groupid

      • ~ reference

      • ratio ~ pairwise

      • difference ~ pairwise | groupid

      • ~ I(x - mean(x)) | groupid

      • ⁠~ I(\(x) c(a = x[1], b = mean(x[2:3]))) | groupid⁠

  • Matrix or Vector: Each column is a vector of weights. The the output is the dot product between these vectors of weights and the vector of estimates. The matrix can have column names to label the estimates.

  • Function:

    • Accepts an argument x: object produced by a marginaleffects function or a data frame with column rowid and estimate

    • Returns a data frame with columns term and estimate (mandatory) and rowid (optional).

    • The function can also accept optional input arguments: newdata, by, draws.

    • This function approach will not work for Bayesian models or with bootstrapping. In those cases, it is easy to use get_draws() to extract and manipulate the draws directly.

  • See the Examples section below and the vignette: https://marginaleffects.com/chapters/hypothesis.html

equivalence

Numeric vector of length 2: bounds used for the two-one-sided test (TOST) of equivalence, and for the non-inferiority and non-superiority tests. See Details section below.

eps

NULL or numeric value which determines the step size to use when calculating numerical derivatives: (f(x+eps)-f(x))/eps. When eps is NULL, the step size is 0.0001 multiplied by the difference between the maximum and minimum values of the variable with respect to which we are taking the derivative. Changing eps may be necessary to avoid numerical problems in certain models.

constrats_by

A character vector (default NULL) specifying the variables by which hypotheses should be tested (e.g., for post-draw comparisons). These variables should be a subset of the variables in the by argument of marginaleffects::comparisons().

constrats_at

A character vector (default NULL) specifying the values at which hypotheses should be tested. Useful for large estimates to limit testing to fewer rows.

reformat

A logical (default TRUE) to reformat the output from marginaleffects::comparisons() as a data.frame, with column names such as conf.low as Q2.5, conf.high as Q97.5, and dropping unnecessary columns like term, contrast, and predicted.

estimate_center

A character string (default NULL) specifying how to center estimates: either 'mean' or 'median'. This option sets the global options as follows: options("marginaleffects_posterior_center" = "mean") or options("marginaleffects_posterior_center" = "median"). These global options are restored upon function exit using base::on.exit().

estimate_interval

A character string (default NULL) to specify the type of credible intervals: 'eti' for equal-tailed intervals or 'hdi' for highest density intervals. This option sets the global options as follows: options("marginaleffects_posterior_interval" = "eti") or options("marginaleffects_posterior_interval" = "hdi"), and is restored on exit using base::on.exit().

dummy_to_factor

A named list (default NULL) to convert dummy variables into a factor variable. The list must include the following elements:

  • factor.dummy: A character vector of dummy variables to be converted to factors.

  • factor.name: The name for the newly created factor variable (default is 'factor.var' if NULL).

  • factor.level: A vector specifying the factor levels. If NULL, levels are taken from factor.dummy. If factor.level is provided, its length must match factor.dummy.

verbose

A logical argument (default FALSE) to specify whether to print information collected during the setup of the object(s).

expose_function

A logical argument (default FALSE) to indicate whether Stan functions should be exposed. If TRUE, any Stan functions exposed during the model fit using expose_function = TRUE in the bsitar() function are saved and can be used in post-processing. By default, expose_function = FALSE in post-processing functions, except in optimize_model() where it is set to NULL. If NULL, the setting is inherited from the original model fit. It must be set to TRUE when adding fit criteria or bayes_R2 during model optimization.

usesavedfuns

A logical value (default NULL) indicating whether to use already exposed and saved Stan functions. This is typically set automatically based on the expose_functions argument from the bsitar() call. Manual specification of usesavedfuns is rarely needed and is intended for internal testing, as improper use can lead to unreliable estimates.

clearenvfuns

A logical value indicating whether to clear the exposed Stan functions from the environment (TRUE) or not (FALSE). If NULL, clearenvfuns is set based on the value of usesavedfuns: TRUE if usesavedfuns = TRUE, or FALSE if usesavedfuns = FALSE.

funlist

A list (default NULL) specifying function names. This is rarely needed, as required functions are typically retrieved automatically. A use case for funlist is when sigma_formula, sigma_formula_gr, or sigma_formula_gr_str use an external function (e.g., poly(age)). The funlist should include function names defined in the globalenv(). For functions needing both distance and velocity curves (e.g., plot_curves(..., opt = 'dv')), funlist must include two functions: one for the distance curve and one for the velocity curve.

envir

The environment used for function evaluation. The default is NULL, which sets the environment to parent.frame(). Since most post-processing functions rely on brms, it is recommended to set envir = globalenv() or envir = .GlobalEnv, especially for derivatives like velocity curves.

...

Additional arguments passed to the brms::fitted.brmsfit() and brms::predict() functions.

Value

A data frame with estimates and confidence intervals for the specified parameters, or a list when method = 'custom' and hypothesis is not NULL.

Author(s)

Satpal Sandhu satpal.sandhu@bristol.ac.uk

References

There are no references for Rd macro ⁠\insertAllCites⁠ on this help page.

See Also

marginaleffects::comparisons(), marginaleffects::avg_comparisons(), marginaleffects::plot_comparisons()

Examples




# Fit Bayesian SITAR model 

# To avoid mode estimation which takes time, the Bayesian SITAR model fit to 
# the 'berkeley_exdata' has been saved as an example fit ('berkeley_exfit').
# See 'bsitar' function for details on 'berkeley_exdata' and 'berkeley_exfit'.

# Note: Since no covariates are included in this model, the 'marginal_comparison' 
# function is being shown here as a dummy example. In practice, comparisons may 
# not make sense without relevant covariates. 

# Check and confirm whether model fit object 'berkeley_exfit' exists
berkeley_exfit <- getNsObject(berkeley_exfit)

model <- berkeley_exfit

# Call marginal_comparison to demonstrate the function
marginal_comparison(model, draw_ids = 1)



[Package bsitar version 0.3.2 Index]