TY - JOUR
T1 - An efficient k-means clustering algorithm: analysis and implementation
JF - Pattern Analysis and Machine Intelligence, IEEE Transactions on
Y1 - 2002
A1 - Kanungo,T.
A1 - Mount, Dave
A1 - Netanyahu,N. S
A1 - Piatko,C. D
A1 - Silverman,R.
A1 - Wu,A. Y
KW - algorithm;color
KW - algorithm;image
KW - algorithm;kd-tree;mean
KW - analysis;filtering
KW - clustering
KW - clustering;
KW - compression;data
KW - distance;covariance
KW - Lloyd
KW - matrices;filtering
KW - quantization;data
KW - segmentation;k-means
KW - squared
KW - structure;data-sensitive
KW - theory;pattern
AB - In k-means clustering, we are given a set of n data points in d-dimensional space R^{d} and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation
VL - 24
SN - 0162-8828
CP - 7
M3 - 10.1109/TPAMI.2002.1017616
ER -