Penrose.dist {smsets}R Documentation

Penrose's distance calculator

Description

Computes Penrose's distance between m multivariate populations or samples, when information is available on the means and variances.

Usage

Penrose.dist(x, group)

Arguments

x

A data frame with p + 1 columns (one factor and p response variables).

group

The classification factor defining m samples or groups. It must be one of the variables in x.

Details

Let the mean of X_k in population i be \mu_{ki}, k=1,...,p; i=1,...,m and assume that the variance of variable X_k is V_k. The Penrose (1953) distance P_{ij} between population i and population j is given by

P_{ij} = \sum_{k = 1}^{p} \frac{(\mu_{ki} - \mu_{kj})^2}{pV_k}

Penrose's distances between multivariate samples are computed using this expression, but \mu_{ki}, \mu_{kj} and V_k being replaced by their corresponding sample estimates.

A disadvantage of Penrose's measure is that it does not consider the correlations between the p variables.

The function requires package biotools (da Silva, 2017, 2021).

Value

Returns an object of class "Penrose.dist", a list containing the following components:

name A character string describing the function.
means.vec A numeric matrix with p rows and m columns giving the mean of each variable per group.
covs.list A list containing the m sample covariance matrices.
Samp.sizes A table showing the number of observations used in the calculation of the covariance matrix for each group.
PooledCov The pooled covariance matrix. This matrix can be accessed and used as an input argument for the calculation of Mahalanobis distance in packages biotools (da Silva, 2017, 2021) and ecodist (Goslee and Urban 2007).
Penrose.mat The Penrose distances given as a "matrix" object.
Penros.dist The Penrose distances given as a "dist" object.
group A character string specifying the name of the classification factor defining groups.
levels.group a vector of length m, showing the levels in factor group.
data.name a character string giving the name of the data.
variables a character string vector containing the variable names.
data the data frame analyzed.

Author(s)

Jorge Navarro Alberto, ganava4@gmail.com

References

da Silva, A.R. (2021). biotools: Tools for Biometry and Applied Statistics in Agricultural Science. R package version 4.2. https://cran.r-project.org/package=biotools.

da Silva, A.R., Malafaia, G., and Menezes, I.P.P. (2017). biotools: an R function to predict spatial gene diversity via an individual-based approach. Genetics and Molecular Research 16. https://doi.org/10.4238/gmr16029655.

Goslee, S.C. and Urban, D.L. (2007). The ecodist package for dissimilarity-based analysis of ecological data. Journal of Statistical Software 22(7):1-19. DOI:10.18637/jss.v022.i07

Manly, B.F.J., Navarro Alberto, J.A. and Gerow, K. (2024) Multivariate Statistical Methods. A Primer. 5th Edn. Chapman and Hall/CRC.

Penrose, L.W. (1953). Distance, size and shape. Annals of Eugenics 18: 337-43.

Examples

data(skulls)
res.Penrose <- Penrose.dist(x = skulls, group = Period)
# Brief output
res.Penrose


[Package smsets version 1.2.3 Index]