twoterm.lmm.stan {phylopairs} | R Documentation |
twoterm.lmm.stan
Description
Fits the two-term lmm model of Castillo (2007) in a Bayesian framework. Bayesian sampling is conducted in the Stan software via the 'rstan' package. Users supply vectors of observations and an ultrametric phylogenetic tree. Users can alter parameters for model-parameter prior distributions and Bayesian sampling settings. See details.
Usage
twoterm.lmm.stan(des, y, sp1s, sp2s, tree,
iter=2000, chains=4, coef.u=0, coef.sd=10, physig2.u=-1,
physig2.sd=1, randsig2.loc=0, randsig2.sc=2.5, cores=4, ...)
Arguments
des |
A vector of predictor variable observations OR, in the case of multiple predictors, a matrix in which each column is a vector of observations of a given predictor. Function will add a column of 1s to make this a design matrix whose first column corresponds to the model intercept (unless such a column already exists). |
y |
A vector of response variable observations. |
sp1s |
Character vector naming the "species 1" of every pair. IMPORTANT: name formatting must match that used in the tip.label of the tree |
sp2s |
Character vector naming the "species 2" of every pair. IMPORTANT: name formatting must match that used in the tip.label of the tree |
tree |
Phylogenetic tree (a 'phylo' object) for the species included in the analysis (regardless of whether they are species 1 or 2). |
iter |
Number of iterations to run on each chain; defaults to 2000 (more are often necessary). |
chains |
Number of Markov chains to run; defaults to 4. |
coef.u |
Mean of the Gaussian prior for each preditor variable coefficient; defaults to 0. |
coef.sd |
SD of the Gaussian prior for each preditor variable coefficient; defaults to 10. |
physig2.u |
Mean of the prior distribution (lognormal) for the scale of the phylogenetic component of residual covariance; defaults to -1. |
physig2.sd |
SD of the prior distribution (lognormal) for the scale of the phylogenetic component of residual covariance; defaults to 1. |
randsig2.loc |
Location parameter of the prior distribution (cauchy) for the scale of the independent component of residual covariance ; defaults to 0. |
randsig2.sc |
Scale parameter of the prior distribution (cauchy) for the scale of the independent component of residual covariance ; defaults to 2.5. |
cores |
Number of cores to use. |
... |
additional arguments passed to |
Details
The model introduced by Castillo (2007) for analyzing lineage-pair data is a
version of a phylogenetic linear mixed model (plmm) in which there are two
random-effect terms, one for the 'species 1' and another for the 'species 2' in
every pair. The model is Y = Xb + Z1u1 + Z1u2 + epsilon
, where u1 and u2 are
vectors of species-specific random effects that covary according to a phylogenetic
covariance matrix C. In this model, u1 ~ N(0,physig2*C)
, u2 ~ N(0,physig2*C)
, and
epsilon~N(0, randsig2*I)
, where randsig2 is the scaling parameter for identity
matrix I and physig2 is the scaling parameter for phylogenetic covariance
matrix C. See twoterm_lmm_mats
for details on calculating
the Z1 and Z2 matrices.
Prior Distributions for Model Parameters: The underlying stan models assume the following prior distributions for model parameters
Regression coefficients: Gaussian prior (users can set prior mean and sd)
-
physig2
: lognormal prior (users can set prior mean and sd) -
randsig2
: cauchy prior (users can set location and scale parameters of prior)
Value
A list containing two elements: (1) the posterior distribution of model parameters, and (2) the log-likelihood of the posteriors for use in downstream analyses (e.g. the calculation of model fitting metrics like loo or waic). For interpreting model parameters, note that Coef[1]
is the intercept and Coef[2]
, Coef[3]
, ... , Coef[N]
are regression coefficient for the 1st-Nth predictor variables.
References
Castillo, D. M. (2007). Factors contributing to the accumulation of reproductive isolation: A mixed model approach. Ecology and Evolution 7:5808-5820. doi.org/10.1002/ece3.3093
Examples
# Load a dataset simulated with a non-independent response observations
data(data3)
# Also load the simulated tree that was used to generate those pairs
data(sim.tree1)
# Fit an OLS model
result1 = linreg.stan(des=data3[,3], y=data3[,4], cores=2)
# Fit the twoterm.lmm.stan model
result2 = twoterm.lmm.stan(des=data3[,3], y=data3[,4], sp1s=data3[,1],
sp2s=data3[,2], tree=sim.tree1, cores=2)
# Compare posterior parameter estimates
result1[[1]]
result2[[1]]
# Compare the fit of the two models via loo and waic
loo1 = loo::loo(result1[[2]])
loo2 = loo::loo(result2[[2]])
waic1 = loo::waic(result1[[2]])
waic2 = loo::waic(result2[[2]])
loo1
loo2
waic1
waic2
loo::loo_compare(loo1, loo2)
loo::loo_compare(waic1, waic2)