%0 Journal Article
%J Pattern Analysis and Machine Intelligence, IEEE Transactions on
%D 2002
%T An efficient k-means clustering algorithm: analysis and implementation
%A Kanungo,T.
%A Mount, Dave
%A Netanyahu,N. S
%A Piatko,C. D
%A Silverman,R.
%A Wu,A. Y
%K algorithm;color
%K algorithm;image
%K algorithm;kd-tree;mean
%K analysis;filtering
%K clustering
%K clustering;
%K compression;data
%K distance;covariance
%K Lloyd
%K matrices;filtering
%K quantization;data
%K segmentation;k-means
%K squared
%K structure;data-sensitive
%K theory;pattern
%X In k-means clustering, we are given a set of n data points in d-dimensional space R^{d} and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation
%B Pattern Analysis and Machine Intelligence, IEEE Transactions on
%V 24
%P 881 - 892
%8 2002/07//
%@ 0162-8828
%G eng
%N 7
%R 10.1109/TPAMI.2002.1017616