rctglm {postcard} | R Documentation |
Fit GLM and find any estimand (marginal effect) using plug-in estimation with variance estimation using influence functions
Description
The procedure uses plug-in-estimation and influence functions to perform robust inference of any specified estimand in the setting of a randomised clinical trial, even in the case of heterogeneous effect of covariates in randomisation groups. See Powering RCTs for marginal effects with GLMs using prognostic score adjustment by Højbjerre-Frandsen et. al (2025) for more details on methodology.
Usage
rctglm(
formula,
exposure_indicator,
exposure_prob,
data,
family = gaussian,
estimand_fun = "ate",
estimand_fun_deriv0 = NULL,
estimand_fun_deriv1 = NULL,
cv_variance = FALSE,
cv_variance_folds = 10,
verbose = options::opt("verbose"),
...
)
Arguments
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’ in the glm documentation. |
exposure_indicator |
(name of) the binary variable in |
exposure_prob |
a |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called. |
family |
a description of the error distribution and link
function to be used in the model. For |
estimand_fun |
a |
estimand_fun_deriv0 |
a |
estimand_fun_deriv1 |
a |
cv_variance |
a |
cv_variance_folds |
a |
verbose |
|
... |
Additional arguments passed to |
Details
The procedure assumes the setup of a randomised clinical trial with observations grouped by a binary
exposure_indicator
variable, allocated randomly with probability exposure_prob
. A GLM is
fit and then used to predict the response of all observations in the event that the exposure_indicator
is 0 and 1, respectively. Taking means of these predictions produce the counterfactual means
psi0
and psi1
, and an estimand r(psi0, psi1)
is calculated using any specified estimand_fun
.
The variance of the estimand is found by taking the variance of the influence function of the estimand.
If cv_variance
is TRUE
, then the counterfactual predictions for each observation (which are
used to calculate the value of the influence function) is obtained as out-of-sample (OOS) predictions
using cross validation with number of folds specified by cv_variance_folds
. The cross validation splits
are performed using stratified sampling with exposure_indicator
as the strata
argument in rsample::vfold_cv.
Read more in vignette("model-fit")
.
Value
rctglm
returns an object of class inheriting from "rctglm"
.
An object of class rctglm
is a list containing the following components:
-
estimand
: Adata.frame
with plug-in estimate of estimand, standard error (SE) estimate and variance estimate of estimand -
estimand_funs
: Alist
with-
f
: Theestimand_fun
used to obtain an estimate of the estimand from counterfactual means -
d0
: The derivative with respect topsi0
-
d1
: The derivative with respect topsi1
-
-
means_counterfactual
: Adata.frame
with counterfactual meanspsi0
andpsi1
-
fitted.values_counterfactual
: Adata.frame
with counterfactual mean values, obtained by transforming the linear predictors for each group by the inverse of the link function. -
glm
: Aglm
object returned from running stats::glm within the procedure -
call
: The matchedcall
Estimands
As noted in the description, psi0
and psi1
are the counterfactual means found by prediction using
a fitted GLM in the binary groups defined by exposure_indicator
.
Default estimand functions can be specified via "ate"
(which uses the function
function(psi1, psi0) psi1-psi0
) and "rate_ratio"
(which uses the function
function(psi1, psi0) psi1/psi0
). See more information on specifying the estimand_fun
in vignette("model-fit")
.
As a default, the Deriv
package is used to perform symbolic differentiation to find the derivatives of
the estimand_fun
.
See Also
See how to extract information using methods in rctglm_methods.
Use rctglm_with_prognosticscore()
to include prognostic covariate adjustment.
See vignettes
Examples
# Generate some data to showcase example
n <- 100
exp_prob <- .5
dat_gaus <- glm_data(
Y ~ 1+1.5*X1+2*A,
X1 = rnorm(n),
A = rbinom(n, 1, exp_prob),
family = gaussian()
)
# Fit the model
ate <- rctglm(formula = Y ~ .,
exposure_indicator = A,
exposure_prob = exp_prob,
data = dat_gaus,
family = gaussian)
# Pull information on estimand
estimand(ate)
## Another example with different family and specification of estimand_fun
dat_binom <- glm_data(
Y ~ 1+1.5*X1+2*A,
X1 = rnorm(n),
A = rbinom(n, 1, exp_prob),
family = binomial()
)
rr <- rctglm(formula = Y ~ .,
exposure_indicator = A,
exposure_prob = exp_prob,
data = dat_binom,
family = binomial(),
estimand_fun = "rate_ratio")
odds_ratio <- function(psi1, psi0) (psi1*(1-psi0))/(psi0*(1-psi1))
or <- rctglm(formula = Y ~ .,
exposure_indicator = A,
exposure_prob = exp_prob,
data = dat_binom,
family = binomial,
estimand_fun = odds_ratio)