An official website of the United States government
In one sense, statistics is practice of summarizing data by reducing the dimensionality of a data set in a sensible way. One way in which the dimensionality of a data set can be reduced is to classify the n items in a dataset into k clusters. One such method of classification is known as the k-means clustering algorithm. In this algorithm the statistician uses a measure of distance between the items to define items in a dataset as similar, and then classifies that item as having membership to one of k clusters.