Tag Archives: Data mining

Super Nerdy: K-Means Clustering Of Distillery Profiles

Add it to the “terroir isn’t a thing in scotch/regions are meaningless” pile. Over at a big data blog, Luba Gloukhov did a k-means clustering of 86 whiskies.

What’s that, you ask?

K-means clustering is a technique to analyze large datasets where your end desire is to group things together based on mathematically calculated distances between attributes in the data set.  Essentially, the program runs through the data set and figures out what elements are the most alike.

This work is more accessible and understandable in the form of David Wishart’s Whisky Classified, which grouped several distilleries together by flavor profile. While Wishart’s is a great effort and certainly one of the best introductions to the concept, the challenge is data points.

It’d be interesting to see this approach applied to larger, more constantly updated data sets such as the Malt Maniacs’ data, though missing from most of these are an agreed-upon set of flavor variables that may be scored.

If the concept is over your head or you don’t dig reading code samples, just look at his map plots, which show a pretty scattershot distribution across Scotland by flavor. There’s obviously a cluster in Speyside but it’d be more useful to do a zoomed-in view there.

Certainly it’s something that would suggest a lot more fun data mining, but it’s an interesting start.