compare_models {maxent.ot} | R Documentation |
Compare Maxent OT models using a variety of methods
Description
Compares two or more model fit to the same data set to determine which provides the best fit, using a variety of methods.
Usage
compare_models(..., method = "lrt")
Arguments
... |
Two or more models objects to be compared. These objects should
be in the same format as the objects returned by the |
method |
The method of comparison to use. This currently includes |
Details
The available comparison methods are
-
lrt: The likelihood ratio test. This method can be applied to a maximum of two models, and the parameters of these models (i.e., their constraints) must be in a strict subset/superset relationship. If your models do not meet these requirements, you should use a different method.
The likelihood ratio is calculated as follows:
LR = 2(LL_2 - LL_1)
where
LL_2
is log likelihood of the model with more parameters. A p-value is calculated by conducting a chi-squared test withX^2 = LR
and the degrees of freedom set to the difference in number of parameters between the two models. This p-value tells us whether the difference in likelihood between the two models is significant (i.e., whether the extra parameters in the full model are justified by the increase in model fit). -
aic: The Akaike Information Criterion. This is calculated as follows for each model:
AIC = 2k - 2LL
where
k
is the number of model parameters (i.e., constraints) and LL is the model's log likelihood. -
aic_c: The Akaike Information Criterion corrected for small sample sizes. This is calculated as follows:
AIC_c = 2k - 2LL + \frac{2k^2 + 2k}{n - k - 1}
where
n
is the number of samples and the other parameters are identical to those used in the AIC calculation. Asn
approaches infinity, the final term converges to 0, and so this equation becomes equivalent to AIC. Please see the note below for information about sample sizes. -
bic: The Bayesian Information Criterion. This is calculated as follows:
BIC = k\ln(n) - 2LL
As with
aic_c
, this calculation relies on the number of samples. Please see the discussion on sample sizes below before using this method.
A few caveats for several of the comparison methods:
The likelihood ratio test (
lrt
) method applies to exactly two models, and assumes that the parameters of these models are nested: that is, the constraints in the smaller model are a strict subset of the constraints in the larger model. This function will verify this to some extent based on the number and names of constraints.The Akaike Information Criterion adjusted for small sample sizes (
aic_c
) and the Bayesian Information Criterion (bic
) rely on sample sizes in their calculations. The sample size for a data set is defined as the sum of the column of surface form frequencies. If you want to apply these methods, it is important that the values in the column are token counts, not relative frequencies. Applying these methods to relative frequencies, which effectively ignore sample size, will produce invalid results.
The aic
, aic_c
, and bic
comparison methods return raw AIC/AICc/BIC
values as well as weights corresponding to these values. These weights
are calculated similarly for each model:
W_i = \frac{\exp(-0.5 \delta_i)}{\sum_{j=1}^{m}{\exp(-0.5 \delta_j)}}
where \delta_i
is the difference in score (AIC, AICc, BIC) between
model i
and the model with the best score, and m
is the number of
models being compared. These weights provide the relative likelihood or
conditional probability of this model being the best model (by whatever
definition of "best" is assumed by the measurement type) given the data and
the selection of models it is being compared to.
Value
A data frame containing information about the comparison. The contents and size of this data frame vary depending on the method used.
-
lrt
: A data frame with a single row and the following columns:-
description
: the names of the two models being compared. The name of the model with more parameters will be first. -
chi_sq
: the chi-squared value calculated during the test. -
k_delta
: the difference in parameters between the two models used as degrees of freedom in the chi-squared test. -
p_value
: the p-value calculated by the test
-
-
aic
: A data frame with as many rows as there were models passed in. The models are sorted in ascending order of AIC (i.e., best first). This data frame has the following columns:-
model
: The name of the model. -
k
: The number of parameters. -
aic
: The model's AIC value. -
aic.delta
: The difference between this model's AIC value and the AIC value of the model with the smallest AIC value. -
aic.wt
: The model's AIC weight: this reflects the relative likelihood (or conditional probability) that this model is the "best" model in the set. -
cum.wt
: The cumulative sum of AIC weights up to and including this model. -
ll
: The log likelihood of this model.
-
-
aicc
: The data frame returned here is analogous to the structure of the AIC data frame, with AICc values replacing AICs and accordingly modified column names. There is one additional column:-
n
: The number of samples in the data the model is fit to.
-
-
bic
: The data frame returned here is analogous to the structure of the AIC and AICc data frames. Like the AICc data frame, it contains then
column.
Examples
# Get paths to toy data files
# This file has two constraints
data_file_small <- system.file(
"extdata", "sample_data_frame.csv", package = "maxent.ot"
)
# This file has three constraints
data_file_large <- system.file(
"extdata", "sample_data_frame_large.csv", package = "maxent.ot"
)
# Fit weights to both data sets with no biases
tableaux_small <- read.csv(data_file_small)
small_model <- optimize_weights(tableaux_small)
tableaux_large <- read.csv(data_file_large)
large_model <- optimize_weights(tableaux_large)
# Compare models using likelihood ratio test. This is appropriate here
# because the constraints are nested.
compare_models(small_model, large_model, method='lrt')
# Compare models using AIC
compare_models(small_model, large_model, method='aic')
# Compare models using AICc
compare_models(small_model, large_model, method='aic_c')
# Compare models using BIC
compare_models(small_model, large_model, method='bic')