trimkmeans {trimcluster} | R Documentation |
Trimmed k-means clustering
Description
The trimmed k-means clustering method by Cuesta-Albertos, Gordaliza and Matran (1997). This optimizes the k-means criterion under trimming a portion of the points.
Usage
trimkmeans(data,k,trim=0.1, scaling=FALSE,
runs=500, niter1=3, niter2=20, nkeep=5, points=NULL,
countmode, printcrit, maxit,
parallel=FALSE, n.cores=-1, trace=0, ...)
## S3 method for class 'tkm'
print(x, ...)
## S3 method for class 'tkm'
plot(x, data, ...)
Arguments
data |
matrix or data.frame with raw data |
k |
integer. Number of clusters. |
trim |
numeric between 0 and 1. Proportion of points to be trimmed. |
scaling |
logical. If |
runs |
The number of random initializations to be performed. |
niter1 |
The number of concentration steps to be performed for the nstart initializations. |
niter2 |
The maximum number of concentration steps to be performed for the
|
nkeep |
The number of iterated initializations (after niter1 concentration steps) with the best values in the target function that are kept for further iterations |
points |
|
countmode |
(deprecated) optional positive integer. Every |
printcrit |
(deprecated) logical. If |
maxit |
(deprecated, use the combination |
parallel |
A logical value, specifying whether the nstart initializations should be done in parallel. |
n.cores |
The number of cores to use when paralellizing, only taken into account if parallel=TRUE. |
trace |
Defines the tracing level, which is set to 0 by default. Tracing level 1 gives additional information on the stage of the iterative process. |
x |
object of class |
... |
further arguments to be transferred to |
Details
The function trimkmeans()
now calls the function tkmeans()
from
the package tclust
. This makes the procedure much faster since
(a) tkmeans()
is implemented in C++, (b) a new random initialization is introduced
(see the parameters niter1
, niter2
and nkeep
which replace
the previous maxit
and (c) it is posible to run the initialization in parallel
(see the argument parallel
and ncores
.
plot.tkm
calls plotcluster
if the
dimensionality of the data p
is 1, shows a scatterplot
with non-trimmed regions if p=2
and discriminant coordinates
computed from the clusters (ignoring the trimmed points) if p>2
.
Value
An object of class 'tkm' which is a LIST with components
classification |
integer vector coding cluster membership with trimmed
observations coded as |
means |
numerical matrix giving the mean vectors of the k classes. |
disttom |
vector of squared Euclidean distances of all points to the closest mean. |
ropt |
maximum value of |
k |
see above. |
trim |
see above. |
runs |
see above. |
scaling |
see above. |
Author(s)
Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche/
References
Cuesta-Albertos, J. A., Gordaliza, A., and Matran, C. (1997) Trimmed k-Means: An Attempt to Robustify Quantizers, Annals of Statistics, 25, 553-576.
See Also
Examples
set.seed(10001)
n1 <-60
n2 <-60
n3 <-70
n0 <-10
nn <- n1+n2+n3+n0
pp <- 2
X <- matrix(rep(0,nn*pp),nrow=nn)
ii <-0
for (i in 1:n1){
ii <-ii+1
X[ii,] <- c(5,-5)+rnorm(2)
}
for (i in 1:n2){
ii <- ii+1
X[ii,] <- c(5,5)+rnorm(2)*0.75
}
for (i in 1:n3){
ii <- ii+1
X[ii,] <- c(-5,-5)+rnorm(2)*0.75
}
for (i in 1:n0){
ii <- ii+1
X[ii,] <- rnorm(2)*8
}
tkm1 <- trimkmeans(X, k=3, trim=0.1, runs=5)
## runs=5 is used to save computing time; runs must be >= nkeep
print(tkm1)
plot(tkm1,X)