cLinear {kerntools} | R Documentation |
Compositional kernels
Description
'cLinear()' is the compositional-linear kernel, which is useful for compositional data (relative frequencies or proportions). 'Aitchison()' is akin to the RBF kernel for this type of data. Thus, the expected input for both kernels is a matrix or data.frame containing strictly non-negative or (even better) positive numbers. This input has dimension NxD, with N>1 samples and D>1 compositional features.
Usage
cLinear(X, cos.norm = FALSE, feat_space = FALSE, zeros = "none")
Aitchison(X, g = NULL, zeros = "none")
Arguments
X |
Matrix or data.frame that contains the compositional data. |
cos.norm |
Should the resulting kernel matrix be cosine normalized? (Defaults: FALSE). |
feat_space |
If FALSE, only the kernel matrix is returned. Otherwise, the feature space is also returned. (Defaults: FALSE). |
zeros |
"none" to warrant that there are no zeroes in X, "pseudo" to replace zeroes by a pseudocount. (Defaults="none"). |
g |
Gamma hyperparameter. If g=0 or NULL, the matrix of squared Aitchison distances is returned instead of the Aitchison kernel matrix. (Defaults=NULL). |
Details
In compositional data, samples (rows) sum to an arbitrary or irrelevant number. This is most clear when working with relative frequencies, as all samples add to 1 (or 100, or other uninformative value). Zeroes are a typical challenge when using compositional approaches. They introduce ambiguity because they can have multiple causes; a zero may signal a true absence, or a value so small that it is below the detection threshold of an instrument. A simple approach to deal with zeroes is replacing them by a pseudocount. More sophisticated approaches are reviewed elsewhere; see for instance the R package 'zCompositions'.
Value
Kernel matrix (dimension: NxN).
References
Ramon, E., Belanche-Muñoz, L. et al (2021). kernInt: A kernel framework for integrating supervised and unsupervised analyses in spatio-temporal metagenomic datasets. Frontiers in microbiology 12 (2021): 609048. doi: 10.3389/fmicb.2021.609048
Examples
data <- soil$abund
## This data is sparse and contains a lot of zeroes. We can replace them by pseudocounts:
Kclin <- cLinear(data,zeros="pseudo")
Kclin[1:5,1:5]
## With the feature space:
Kclin <- cLinear(data,zeros="pseudo",feat_space=TRUE)
## With cosine normalization:
Kcos <- cLinear(data,zeros="pseudo",cos.norm=TRUE)
Kcos[1:5,1:5]
## Aitchison kernel:
Kait <- Aitchison(data,g=0.0001,zeros="pseudo")
Kait[1:5,1:5]