3.14 K-Means vs. Hierarchical Clustering
With hierarchical clustering, the modeler does not need to specify the number of clusters in advance. Instead, she can generate a dendrogram, assess its setup, and then make a decision regarding a number of clusters.
If you are using a relatively large dataset, you will notice that hierarchical clustering is a bit slower than k-means clustering. Hierarchical models have much more computational complexity, since they require calculations of all the pairwise distances among records, plus the calculations needed to measure distances among clusters, rather than just the distance from each record to the model centroids.
This should not be a deterrent, though! Think about why – if a business is going to use a consumer segmentation model on its large user base for a year, the model could have considerable payoff across that period. Whether the underlying calculation to generate the model occurred in 2.3 seconds, or in 24.9 seconds, is immaterial to its purpose. Even if the second type took 240 seconds, or even 240 minutes, this would still be irrelevant – if it’s the better model, it should be the one used. The modeler could also manage the data volume by using a sample to create a clustering model.
Hierarchical clustering offers more flexibility with record-to-record distance metrics. Here, we stuck with Euclidean distance for records and Ward’s distance for clusters, but we encourage you to iterate and experiment with different combinations of these to see what you find.
With hierarchical clustering, a modeler can mix qualitative and quantitative variables as inputs, using a metric known as Gower’s distance.