Cluster.validity {clusterv} | R Documentation |
Validity indices computation
Description
It computes the stability indices for each individual cluster, the overall validity index of the clustering and (optionally) the Assignment Confidence (AC) index for each example. To compute the indices a set of clusterings is used. It assumes that the label of the examples are integers.
Usage
Cluster.validity(cluster, M.clusters, AC = FALSE)
Cluster.validity.from.similarity(cluster, Sim.M, AC = TRUE)
Arguments
cluster |
list of the clustering whose validity indices will be computed |
M.clusters |
list of the n clusterings (a list of lists) used for validity index computation |
Sim.M |
similarity matrix |
AC |
if it is TRUE the Assignment Confidence index for each example is computed |
Details
Using the similarity matrix M, the stability index s for a cluster A is:
s(A) = \frac{1}{|A|(|A|-1)} \sum_{(i,j) \in A \times A, i\neq j} M_{ij}
The index s(A)
estimates the stability of a cluster A
by measuring how much the projections
of the pairs (i,j) \in A \times A
occur together in the same cluster in the projected subspaces.
The stability index has values between 0 and 1: low values indicate no reliable clusters,
high values denote stable clusters.
The overall validity of the clustering is the average between the validity indices of the individual clusters.
The Assignment-Confidence (AC) index estimates the confidence of the assignment of an example i to a cluster A using a similarity matrix M:
AC(i,A) = \frac{1}{|A|-1} \sum_{j \in A, j\neq i} M_{ij}
Using a set of realizations of a given randomized projection, the AC-index represents the frequency by which i appears with the other elements of the cluster A.
Value
a list with four components: "validity", "overall.validity", "similarity.matrix", "AC" (optional):
validity |
vector with the validity of each of the clusters |
overall.validity |
validity index of the overall cluster |
similarity.matrix |
pairwise similarity matrix between examples |
AC |
matrix with the Assignment Confidence index for each example. Each row corresponds to an example, each column to a cluster |
Author(s)
Giorgio Valentini valentini@di.unimi.it
See Also
Validity.indices
AC.index
, Do.similarity.matrix
Examples
# Computation of the validity indices for a hierarchical clustering
M <- generate.sample0(n=10, m=1, sigma=1, dim=1000)
d <- dist (t(M));
tree <- hclust(d, method = "average");
plot(tree, main="");
cl.orig <- rect.hclust(tree, k = 3);
l.PMO <- Multiple.Random.hclustering (M, dim=100, pmethod="PMO",
c=3, hmethod="average", n=20)
list.indices <- Cluster.validity(cl.orig, l.PMO, AC = TRUE)
# Computation of the validity indices for a hierarchical clustering
# with less defined clusters
M.less <- generate.sample0(n=10, m=1, sigma=2, dim=1000)
d <- dist (t(M.less));
tree.less <- hclust(d, method = "average");
plot(tree.less, main="");
cl.orig.less <- rect.hclust(tree.less, k = 3);
l.PMO.less <- Multiple.Random.hclustering (M.less, dim=100, pmethod="PMO",
c=3, hmethod="average", n=20)
list.indices.less <- Cluster.validity(cl.orig.less, l.PMO.less, AC = TRUE)