is_separated {spaMM} | R Documentation |
Checking for (quasi-)separation in binomial-response model.
Description
Separation occurs in binomial response models when a combination of the predictor variables perfectly predict a level of the response. In such a case the estimates of the coefficients for these variables diverge to (+/-)infinity, and the numerical algorithms typically fail. To anticipate such a problem, the fitting functions in spaMM
try to check for separation by default. The check may take much time, and is skipped if the “problem size” exceeds a threshold defined by spaMM.options(separation_max=<.>)
, in which case a message will tell users by how much they should increase separation_max
to force the check (its exact meaning and default value are subject to changes without notice but the default value aims to correspond to a separation check time of the order of 1s on the author's computer).
is_separated
is a convenient interface to procedures from the ROI
package, allowing them to be called explicitly by the user to check bootstrap samples (see Example in anova
).
is_separated.formula
is a variant (not yet a formal S3 method) that performs the same check, but using arguments similar to those of fitme(., family=binomial())
.
Usage
is_separated(x, y, verbose = TRUE, solver=spaMM.getOption("sep_solver"))
is_separated.formula(formula, ..., separation_max=spaMM.getOption("separation_max"),
solver=spaMM.getOption("sep_solver"))
Arguments
x |
Design matrix for fixed effects. |
y |
Numeric response vector |
formula |
A model formula |
... |
|
separation_max |
numeric: non-default value allow for easier local control of this spaMM option. |
solver |
character: name of linear programming solver used to assess separation; passed to |
verbose |
Whether to print some messages (e.g., pointing model terms that cause separation) or not. |
Value
Returns a boolean; TRUE
means there is (quasi-)separation. Screen output may give further information, such as pointing model terms that cause separation.
References
The method accessible by solver="glpk"
implements algorithms described by
Konis, K. 2007. Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models. DPhil Thesis, Univ. Oxford.
See Also
See also the 'safeBinaryRegression' and 'detectseparation' package.
Examples
set.seed(123)
d <- data.frame(success = rbinom(10, size = 1, prob = 0.9), x = 1:10)
is_separated.formula(formula= success~x, data=d) # FALSE
is_separated.formula(formula= success~I(success^2), data=d) # TRUE