adjClust {adjclust} | R Documentation |
Adjacency-constrained Clustering
Description
Adjacency-constrained hierarchical agglomerative clustering
Usage
adjClust(mat, type = c("similarity", "dissimilarity"), h = ncol(mat) - 1)
Arguments
mat |
A similarity matrix or a dist object. Most sparse formats from
|
type |
Type of matrix : similarity or dissimilarity. Defaults to
|
h |
band width. It is assumed that the similarity between two items is 0
when these items are at a distance of more than band width h. Default value
is |
Details
Adjacency-constrained hierarchical agglomerative clustering (HAC) is HAC in which each observation is associated to a position, and the clustering is constrained so as only adjacent clusters are merged. These methods are useful in various application fields, including ecology (Quaternary data) and bioinformatics (e.g., in Genome-Wide Association Studies (GWAS)).
This function is a fast implementation of the method that takes advantage of
sparse similarity matrices (i.e., that have 0 entries outside of a diagonal
band of width h
). The method is fully described in (Dehman, 2015) and
based on a kernel version of the algorithm. The different options for the
implementation are available in the package vignette entitled "Notes on CHAC
implementation in adjclust".
Value
An object of class chac
which describes the tree
produced by the clustering process. The object a list with the same
elements as an object of class chac
(merge
,
height
, order
, labels
, call
, method
,
dist.method
), and an extra element mat
: the data on which the
clustering is performed, possibly after pre-transformations described in
the vignette entitled "Notes on CHAC implementation in adjclust".
References
Dehman A. (2015) Spatial Clustering of Linkage Disequilibrium Blocks for Genome-Wide Association Studies, PhD thesis, Universite Paris Saclay.
Ambroise C., Dehman A., Neuvial P., Rigaill G., and Vialaneix N (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics, Algorithms for Molecular Biology 14(22)"
See Also
snpClust
to cluster SNPs based on linkage
disequilibrium
hicClust
to cluster Hi-C data
Examples
sim <- matrix(
c(1.0, 0.1, 0.2, 0.3,
0.1, 1.0 ,0.4 ,0.5,
0.2, 0.4, 1.0, 0.6,
0.3, 0.5, 0.6, 1.0), nrow = 4)
## similarity, full width
fit1 <- adjClust(sim, "similarity")
plot(fit1)
## similarity, h < p-1
fit2 <- adjClust(sim, "similarity", h = 2)
plot(fit2)
## dissimilarity
dist <- as.dist(sqrt(2-(2*sim)))
## dissimilarity, full width
fit3 <- adjClust(dist, "dissimilarity")
plot(fit3)
## dissimilarity, h < p-1
fit4 <- adjClust(dist, "dissimilarity", h = 2)
plot(fit4)