pois4mlogit {spaMM} | R Documentation |
Fit multinomial logit models.
Description
pois4mlogit
fits a multinomial logit model by wrapping iterative fitmv
fits of a multivariate poisson(log)
surrogate model, according to the following logic. In the (mixed or not) multinomial logit model, the probabilities p_{ic}
of the different categories (or types
) c=1,...,C
for the i
th multinomial draw (n_{i1},...,n_{iC})
are of the form
p_{ic}=\frac{e^{\eta_{ic}}}{\sum_{c=1}^C e^{\eta_{ic}}}
where each \eta_{ic}
is a linear predictor. This model can be fitted using surrogate poisson
models wherein the formula
contains a term offset(.dynoffset)
in addition to the terms specifying each type-specific \eta_{.c}
. The procedure will update the .dynoffset
iteratively in the appropriate way to fit the multinomial model.
Since fitmv
is called, its various specific features can be used, such as random effects correlated across the different response types, or fixed-effect coefficients that may well be shared (by its X2X
argument) across response types. Note that a shared Intercept term is not identifiable as implied by the above expression for the p_{ic}
s, and non-identifiability can also result when shared predictor values have shared coefficients.
This feature is experimental. The iterative procedure avoids the need for developing specialized software, although it may be slower than such software would be. Not all post-fit procedures may run or return meaningful results from objects returned by pois4mlogit
. There is a specific predict
method for such objects: this method returns by default the predicted frequencies, summing to 1 for each multinomial draw (see Examples).
Usage
pois4mlogit(submodels, data, to.long=FALSE, init=list(), control=list(),
..., next_inits=c("ranPars","v_h","fixef"),
types, n_iter = 1000L, tol=1e-5, fac=1, progress = FALSE)
## S3 method for class 'pois4mlogit'
predict(object, newdata=NULL, ...)
Arguments
submodels |
Passed to |
data |
Data frame; each line contains a multinomial draw and, as usual, the required predictor variables. The counts for the different response types must form different columns of the |
init |
list; Passed to |
next_inits |
Character vector. Controls initiation of a fit from the result of the result of the previous iteration iteration. If |
to.long |
Boolean; for optional reformatting of the data (see Details and Examples). |
control |
list; passed to |
object , newdata |
|
... |
Further arguments passed to |
types |
Character vector: type labels, i.e., column names of the counts for the different mutlinomial response categories in |
n_iter |
Integer: maximum number of iteration of iterative algorithm. |
tol |
Numeric: tolerance threshold for determining convergence. |
fac |
Numeric: a control parameter of the iterative algorithm. The default value is better to avoid convergence issues. Higher values may speed up the fit but may also increase the chance of non-convergence, particularly those above 1.5. |
progress |
Boolean or numeric: whether to print information about number of iterations, and (if |
Details
At convergence, the dynamic offset should not change over iterations, and the poisson model predictions (offset included) for each multinomial draw should sum to the sample size of the draw. Two convergence criteria, the Ocrit
and the Scrit
respectively, assess these properties. Non-convergence may signal a problem with the fitting procedure (which ca be controlled by its fac
argument), *but* may also signal that the model is not identifiable, in which case it should be modified.
Missing data are handled. This includes the detection and correct handling of cases where (i) a multinomial draw is uninformative because only one type has full information (i.e., response value and all required predictor variables); and (ii) a multinomial draw is informative about some but not all types. In that case, the basic form of the model for a multinomial draw still holds for the relative counts of these types.
When to.long=TRUE
, the poisson surrogate model is fitted to data specified in a long form where each multinomial draw of size s_i
is described as s_i
multinomial draws of size 1 (so that each Poisson submodel is fitted to a response vector of 0s and 1s). In this long format, counts for the different response types still form different columns (here containing only 0 or 1) of the data
. This format is not needed (and not memory-efficient) but it may be useful to compare more easily the likelihood of the surrogate model to the multinomial one (see Examples), and it was used in such a way by Chen & Kuo (2001). The Examples use the utility function reshape2long
to convert the original data frame to the long format.
Value
A list
also inheriting from class HLfit
(as the the return value of the fitmv
call from which the present value is derived), and from class pois4mlogit
.
References
The use of an iteratively updated offset is inspired from the arguments in Chen & Kuo (2001), who described related approaches to fit multinomial logit models using poisson(log) surrogate ones.
Chen, Z. and Kuo, L. (2001) A note on the estimation of the multinomial logit model with random effects. The American Statistician 55, 89-95. https://www.jstor.org/stable/2685993
Examples
#### Fitting a binomial(logit) model by a bivariate poisson(log) surrogate:
## Toy data: Let us say we observe a color polymorphism of irises in 10 populations...
set.seed(123)
ssize <- 10L
shape <- 0.35
toydata <- data.frame(
yellow=rbinom(ssize, 16, prob=rbeta(ssize,shape,shape)),
purple=rbinom(ssize, 16, prob=rbeta(ssize,shape,shape)), # (purple ignored below)
blue=rbinom(ssize, 16, prob=rbeta(ssize,shape,shape)),
phenotype=rnorm(ssize)
)
## Standard binomial fit (purple flowers are ignored here)
(byB <- fitme(cbind(yellow,blue) ~ phenotype, family = binomial(),
data=toydata))
## Surrogate fit
(byP2 <- pois4mlogit(submodels = list(
list(yellow ~ offset(.dynoffset) + phenotype, family = poisson()),
list(blue ~ offset(.dynoffset) + phenotype, family = poisson())),
data = toydata, types=c("yellow","blue")))
#### Recover summaries of the binomial fit from the surrogate fit:
## Coefficients of binomial model recovered as:
fixef(byP2)[1:2]-fixef(byP2)[3:4]
## SEs of coefficients of binomial model recovered as:
{
P2B <- rbind(c(1,0,-1,0),c(0,1,0,-1))
P2B %*% vcov(byP2) %*% t(P2B)
}
# practically equivalent to
vcov(byB)
## Probabilities of each type:
matrix(predict(byP2),ncol=2,
dimnames=list(NULL, c("yellow","blue")))
## logLiks
# The logLiks differ between the binomial and poisson surrogate fits
# because the combinatorial coefficients differ. The relationship
# between these logLiks is simple when the long form of the data is used:
str(long2 <- reshape2long(toydata, c("yellow","blue")))
# Fits on long data:
(byBlong <- fitme(cbind(yellow,blue) ~ phenotype, family = binomial(),
data=long2))
(byPlong <- pois4mlogit(submodels = list(
list(yellow ~ offset(.dynoffset) + phenotype, family = poisson()),
list(blue ~ offset(.dynoffset) + phenotype, family = poisson())),
data = toydata, to.long=TRUE, types=c("yellow","blue")))
# The logLik of the surrogate model is
sum(byPlong$eta*byPlong$y) - sum(exp(byPlong$eta)) # = logLik(byPlong)
# while le logLik of the binomial one can be obtained as
sum(byPlong$eta*byPlong$y) # = logLik(byBlong)
# the difference is 156 __= total multinomial sample size__
# This illustrates that the logLiks of different surrogate models differ
# only by a constant independent of the fitted model.
# Likelihood ratios between surrogate models are then
# identical to likelihood ratios between corresponding binomial models,
# provided a single data format is used throughout the comparisons.