emBinRegMAR {glmfitmiss}R Documentation

Fitting binary regression with missing categorical covariates using Expectation-Maximisation (EM) based method

Description

This function allows users to fit generalized linear models with incomplete predictors that are categorical. The model is fitted using a likelihood-based method, which ensures reliable parameter estimation even when dealing with missing data. For more information on the underlying methodology, please refer to Pradhan, Nychka, and Bandyopadhyay (2025).

Usage

emBinRegMAR(
  formula,
  data,
  conflev = 0.95,
  vcorctn = TRUE,
  family = binomial(link = "logit"),
  biascorrectn = TRUE,
  verbose = TRUE
)

Arguments

formula

a formula expression as for regression models, of the form response ~ predictors. The response should be a numeric binary variable with missing values, and predictors can be any variables. A predictor with categorical values with missing can be used in the model. See the documentation of formula for other details.

data

Input data for fitting the model

conflev

a value for the confidence interval, the default is 0.95

vcorctn

a variance-covariance matrix computation using Louis (1982). Defualt is TRUE.

family

A character string specifying the type of model family. The default is family=binomial (lin=logit)

biascorrectn

a TRUE or FALSE value, an option for bias reduced estimates due to Firth (1993). The default is TRUE

verbose

a TRUE or FALSE value, default is verbose = TRUE

Details

The family parameter in the emBinRegMAR function allows you to specify the probability distribution and link function for the response variable in the linear model. It determines the nature of the relationship between the predictors and the response variable. The family argument is particularly important when working with binary data, where the response variable has only two possible outcomes. In such cases, you typically want to fit a logistic regression model.

Currently family=binomial is supported for binary data:

You can also specify different link functions within binomial family. The default link function is the logit function, which models the log-odds of success. Other available link functions include:

It is important to choose the appropriate link function based on the specific characteristics and assumptions of your binary data. The default "binomial" family with the logit link function is often a good starting point, but alternative link functions might be more appropriate depending on the research question and the nature of the data. Note that, this function uses the function 'emforbeta' function. For more details of the function and corresponding different output objects, review the 'emforbeta' function.

Value

return the glm estimates

References

Firth, D. (1993). Bias reduction of maximum likelihood estimates, Biometrika, 80, 27-38. doi:10.2307/2336755.

Ibrahim, J. G. (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association 85, 765–769.

Kosmidis, I., Firth, D. (2021). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. Biometrika, 108, 71-82. doi:10.1093/biomet/asaa052.

Louis, T. A. (1982). Finding the observed information when using the EM algorithm. Proceedings of the Royal Statistical Society, Ser B, 44, 226-233.

Maiti, T., Pradhan, V. (2009). Bias reduction and a solution of separation of logistic regression with missing covariates. Biometrics, 65, 1262-1269.

Pradhan, V., Nychka, D. and Bandyopadhyay, S. (2025). Beyond the Odds: Fitting Logistic Regression with Missing Data in Small Samples (submitted).

Examples

data(ibrahim)
#Fits a logistic regression mode with missing categorical covariates using Ibrahim (1990)

fit <- emBinRegMAR(y~x1+x2+x3, data=ibrahim)
fit

data(est45)
f_fit <- emBinRegMAR (resp ~ Fetoprtn + Antigen + Jaundice + Age, data = est45, biascorrectn=FALSE)
f_fit

data(est45)
f_fit <- emBinRegMAR (resp ~ Fetoprtn + Antigen + Jaundice + Age, data = est45, biascorrectn=FALSE)
f_fit

# -----------------Bias reduced estimates due to Firth (1993) --------------
f_fit1 <- emBinRegMAR (resp ~ Fetoprtn + Antigen + Jaundice + Age, data = est45, biascorrectn=TRUE)
f_fit1

[Package glmfitmiss version 2.1.0 Index]