predict_draws.bgmfit {bsitar}R Documentation

Predicted values from the posterior predictive distribution

Description

The predict_draws() function is a wrapper around the brms::predict.brmsfit() function, which obtains predicted values (and their summary) from the posterior distribution. See brms::predict.brmsfit() for details.

Usage

## S3 method for class 'bgmfit'
predict_draws(
  model,
  newdata = NULL,
  resp = NULL,
  dpar = NULL,
  ndraws = NULL,
  draw_ids = NULL,
  re_formula = NA,
  allow_new_levels = FALSE,
  sample_new_levels = "uncertainty",
  incl_autocor = TRUE,
  numeric_cov_at = NULL,
  levels_id = NULL,
  avg_reffects = NULL,
  aux_variables = NULL,
  ipts = 10,
  deriv = 0,
  deriv_model = TRUE,
  summary = TRUE,
  robust = FALSE,
  transform = NULL,
  probs = c(0.025, 0.975),
  xrange = NULL,
  xrange_search = NULL,
  parms_eval = FALSE,
  parms_method = "getPeak",
  idata_method = NULL,
  verbose = FALSE,
  fullframe = NULL,
  dummy_to_factor = NULL,
  expose_function = FALSE,
  usesavedfuns = NULL,
  clearenvfuns = NULL,
  funlist = NULL,
  envir = NULL,
  ...
)

predict_draws(model, ...)

Arguments

model

An object of class bgmfit.

newdata

An optional data frame for estimation. If NULL (default), newdata is retrieved from the model.

resp

A character string (default NULL) to specify the response variable when processing posterior draws for univariate_by and multivariate models. See bsitar() for details on univariate_by and multivariate models.

dpar

Optional name of a predicted distributional parameter. If specified, expected predictions of this parameters are returned.

ndraws

A positive integer indicating the number of posterior draws to use in estimation. If NULL (default), all draws are used.

draw_ids

An integer specifying the specific posterior draw(s) to use in estimation (default NULL).

re_formula

Option to indicate whether or not to include individual/group-level effects in the estimation. When NA (default), individual-level effects are excluded, and population average growth parameters are computed. When NULL, individual-level effects are included in the computation, and the resulting growth parameters are individual-specific. In both cases (NA or NULL), continuous and factor covariates are appropriately included in the estimation. Continuous covariates are set to their means by default (see numeric_cov_at for details), while factor covariates remain unaltered, allowing for the estimation of covariate-specific population average and individual-specific growth parameters.

allow_new_levels

A flag indicating if new levels of group-level effects are allowed (defaults to FALSE). Only relevant if newdata is provided.

sample_new_levels

Indicates how to sample new levels for grouping factors specified in re_formula. This argument is only relevant if newdata is provided and allow_new_levels is set to TRUE. If "uncertainty" (default), each posterior sample for a new level is drawn from the posterior draws of a randomly chosen existing level. Each posterior sample for a new level may be drawn from a different existing level such that the resulting set of new posterior draws represents the variation across existing levels. If "gaussian", sample new levels from the (multivariate) normal distribution implied by the group-level standard deviations and correlations. This options may be useful for conducting Bayesian power analysis or predicting new levels in situations where relatively few levels where observed in the old_data. If "old_levels", directly sample new levels from the existing levels, where a new level is assigned all of the posterior draws of the same (randomly chosen) existing level.

incl_autocor

A flag indicating if correlation structures originally specified via autocor should be included in the predictions. Defaults to TRUE.

numeric_cov_at

An optional (named list) argument to specify the value of continuous covariate(s). The default NULL option sets the continuous covariate(s) to their mean. Alternatively, a named list can be supplied to manually set these values. For example, numeric_cov_at = list(xx = 2) will set the continuous covariate variable 'xx' to 2. The argument numeric_cov_at is ignored when no continuous covariates are included in the model.

levels_id

An optional argument to specify the ids for the hierarchical model (default NULL). It is used only when the model is applied to data with three or more levels of hierarchy. For a two-level model, levels_id is automatically inferred from the model fit. For models with three or more levels, levels_id is inferred from the model fit under the assumption that hierarchy is specified from the lowest to the uppermost level, i.e., id followed by study, where id is nested within study. However, it is not guaranteed that levels_id is sorted correctly, so it is better to set it manually when fitting a model with three or more levels of hierarchy.

avg_reffects

An optional argument (default NULL) to calculate (marginal/average) curves and growth parameters, such as APGV and PGV. If specified, it must be a named list indicating the over (typically a level 1 predictor, such as age), feby (fixed effects, typically a factor variable), and reby (typically NULL, indicating that parameters are integrated over the random effects). For example, avg_reffects = list(feby = 'study', reby = NULL, over = 'age').

aux_variables

An optional argument to specify the variable(s) that can be passed to the ipts argument (see below). This is useful when fitting location-scale models and measurement error models. If post-processing functions throw an error such as variable 'x' not found in either 'data' or 'data2', consider using aux_variables.

ipts

An integer to set the length of the predictor variable for generating a smooth velocity curve. If NULL, the original values are returned. If an integer (e.g., ipts = 10, default), the predictor is interpolated. Note that these interpolations do not alter the range of the predictor when calculating population averages and/or individual-specific growth curves.

deriv

An integer indicating whether to estimate the distance curve or its derivative (velocity curve). The default deriv = 0 is for the distance curve, while deriv = 1 is for the velocity curve.

deriv_model

A logical value specifying whether to estimate the velocity curve from the derivative function or by differentiating the distance curve. Set deriv_model = TRUE for functions that require the velocity curve, such as growthparameters() and plot_curves(). Set it to NULL for functions that use the distance curve (i.e., fitted values), such as loo_validation() and plot_ppc().

summary

A logical value indicating whether only the estimate should be computed (TRUE), or whether the estimate along with SE and CI should be returned (FALSE, default). Setting summary to FALSE will increase computation time. Note that summary = FALSE is required to obtain correct estimates when re_formula = NULL.

robust

A logical value to specify the summary options. If FALSE (default), the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and median absolute deviation (MAD) are applied instead. Ignored if summary is FALSE.

transform

A function applied to individual draws from the posterior distribution before computing summaries. The argument transform is based on the marginaleffects::predictions() function. This should not be confused with transform from brms::posterior_predict(), which is now deprecated.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

xrange

An integer to set the predictor range (e.g., age) when executing the interpolation via ipts. By default, NULL sets the individual-specific predictor range. Setting xrange = 1 applies the same range for individuals within the same higher grouping variable (e.g., study). Setting xrange = 2 applies an identical range across the entire sample. Alternatively, a numeric vector (e.g., xrange = c(6, 20)) can be provided to set the range within the specified values.

xrange_search

A vector of length two or a character string 'range' to set the range of the predictor variable (x) within which growth parameters are searched. This is useful when there is more than one peak and the user wants to summarize the peak within a specified range of the x variable. The default value is xrange_search = NULL.

parms_eval

A logical value to specify whether or not to compute growth parameters on the fly. This is for internal use only and is mainly needed for compatibility across internal functions.

parms_method

A character string specifying the method used when evaluating parms_eval. The default method is getPeak, which uses the sitar::getPeak() function from the sitar package. Alternatively, findpeaks uses the findpeaks function from the pracma package. This parameter is for internal use and ensures compatibility across internal functions.

idata_method

A character string to indicate the interpolation method. The number of interpolation points is set by the ipts argument. Available options for idata_method are method 1 (specified as 'm1') and method 2 (specified as 'm2').

  • Method 1 ('m1') is adapted from the iapvbs package and is documented here.

  • Method 2 ('m2') is based on the JMbayes package and is documented here. The 'm1' method works by internally constructing the data frame based on the model configuration, while the 'm2' method uses the exact data frame from the model fit, accessible via fit$data. If idata_method = NULL (default), method 'm2' is automatically selected. Note that method 'm1' may fail in certain cases, especially when the model includes covariates (particularly in univariate_by models). In such cases, it is recommended to use method 'm2'.

verbose

A logical argument (default FALSE) to specify whether to print information collected during the setup of the object(s).

fullframe

A logical value indicating whether to return a fullframe object in which newdata is bound to the summary estimates. Note that fullframe cannot be used with summary = FALSE, and it is only applicable when idata_method = 'm2'. A typical use case is when fitting a univariate_by model. This option is mainly for internal use.

dummy_to_factor

A named list (default NULL) to convert dummy variables into a factor variable. The list must include the following elements:

  • factor.dummy: A character vector of dummy variables to be converted to factors.

  • factor.name: The name for the newly created factor variable (default is 'factor.var' if NULL).

  • factor.level: A vector specifying the factor levels. If NULL, levels are taken from factor.dummy. If factor.level is provided, its length must match factor.dummy.

expose_function

A logical argument (default FALSE) to indicate whether Stan functions should be exposed. If TRUE, any Stan functions exposed during the model fit using expose_function = TRUE in the bsitar() function are saved and can be used in post-processing. By default, expose_function = FALSE in post-processing functions, except in optimize_model() where it is set to NULL. If NULL, the setting is inherited from the original model fit. It must be set to TRUE when adding fit criteria or bayes_R2 during model optimization.

usesavedfuns

A logical value (default NULL) indicating whether to use already exposed and saved Stan functions. This is typically set automatically based on the expose_functions argument from the bsitar() call. Manual specification of usesavedfuns is rarely needed and is intended for internal testing, as improper use can lead to unreliable estimates.

clearenvfuns

A logical value indicating whether to clear the exposed Stan functions from the environment (TRUE) or not (FALSE). If NULL, clearenvfuns is set based on the value of usesavedfuns: TRUE if usesavedfuns = TRUE, or FALSE if usesavedfuns = FALSE.

funlist

A list (default NULL) specifying function names. This is rarely needed, as required functions are typically retrieved automatically. A use case for funlist is when sigma_formula, sigma_formula_gr, or sigma_formula_gr_str use an external function (e.g., poly(age)). The funlist should include function names defined in the globalenv(). For functions needing both distance and velocity curves (e.g., plot_curves(..., opt = 'dv')), funlist must include two functions: one for the distance curve and one for the velocity curve.

envir

The environment used for function evaluation. The default is NULL, which sets the environment to parent.frame(). Since most post-processing functions rely on brms, it is recommended to set envir = globalenv() or envir = .GlobalEnv, especially for derivatives like velocity curves.

...

Additional arguments passed to the brms::predict.brmsfit() function. Please see brms::predict.brmsfit() for details on the various options available.

Details

The predict_draws() function computes the fitted values from the posterior distribution. The brms::predict.brmsfit() function from the brms package can be used to obtain predicted (distance) values when the outcome (e.g., height) is untransformed. However, when the outcome is log or square root transformed, the brms::predict.brmsfit() function will return the fitted curve on the log or square root scale. In contrast, the predict_draws() function returns the fitted values on the original scale. Furthermore, predict_draws() also computes the first derivative (velocity), again on the original scale, after making the necessary back-transformation. Aside from these differences, both functions (brms::predict.brmsfit() and predict_draws()) work similarly. In other words, the user can specify all the options available in brms::predict.brmsfit().

Value

An array of predicted response values. See brms::predict.brmsfit() for details.

Author(s)

Satpal Sandhu satpal.sandhu@bristol.ac.uk

See Also

brms::predict.brmsfit()

Examples




# Fit Bayesian SITAR model

# To avoid mode estimation, which takes time, the Bayesian SITAR model is fit 
# to the 'berkeley_exdata' and saved as an example fit ('berkeley_exfit').
# See the 'bsitar' function for details on 'berkeley_exdata' and 
# berkeley_exfit'.

# Check and confirm whether the model fit object 'berkeley_exfit' exists
 berkeley_exfit <- getNsObject(berkeley_exfit)

model <- berkeley_exfit

# Population average distance curve
predict_draws(model, deriv = 0, re_formula = NA)

# Individual-specific distance curves
predict_draws(model, deriv = 0, re_formula = NULL)

# Population average velocity curve
predict_draws(model, deriv = 1, re_formula = NA)

# Individual-specific velocity curves
predict_draws(model, deriv = 1, re_formula = NULL)
 
 

[Package bsitar version 0.3.2 Index]