KMeans Domaining?

Geoff Elson 2 years ago in Resource Estimation updated by Scott P (Moderator / Admin (AUS)) 2 years ago 3

Thought I'd share a bit of code.

I am playing with KMeans clustering as another point of information for domaining, and the results so far have been eye opening.

For those who are unfamiliar it's and machine learning algorithm that simply groups data into clusters by contently reassessing the mean of the groups. The most basic example is using two dimensional x and y data and clustering nearby data.

A few geo applications I've seen are alteration domains based on things like Ca, Mn, Fe etc., mineralogical and met domains based on geochem data. 

Anyway, there is a lot of info to be googled, for me I used it to look at clusters using 7 dimensions of data, Ag, Pb, Zn, Cu, Easting, Northing, and Elevation. The output was remarkably similar to the estimation groups I had chosen and showed likely offsets clear as day and other improvements on my groups.

Note the dark maroon and orange areas, I was unhappy about the histograms from my groupings because they were bimodal for copper, the KMeans cluster shown in the dark pink produced really well formed histograms and help support the hypothesis that the two zones are chemically similar and probably should be grouped.

Here is the script KMeans Script if you are interested. Trigger warning, I am not a coder, I'm sure it's far from perfect. Also, for the python beginners these won't be plug and play unless you get numpy and pandas, to be honest these was the hardest part for me as you can read about my confusion with pip in other posts here about scripting.

Improvements that I am not currently capable of:

  • Adding a first pass to optimize the clusters then feed it to the Kmeans script
  • Filtering data on the file, I can't use As and Fe at the moment because there are nulls.
  • Avoid going out to csv first. I couldn't quickly figure out how to stack the numpy array with the mm file.

Estimate Domains picked manually

Image 2654

Clusters with 10 groups

Image 2655

Likely Offset: Note KMeans clustered the green data across the proposed fault despite the separation distance. I am not blindly following the computer here, this is something that I proposed before trying kmeans and was blown away when I saw it picked it out

Image 2656

Very interesting. Seeing so many applications  in domaining 

Hi Geoff,

Check out hierarchical clustering. I generally find it produces clusters that are more similar to how I would manually cluster data.

from scipy.cluster.hierarchy import fcluster, ward
Z = linkage(data, 'ward')
clusters = fcluster(Z, t= n_clusters, criterion='maxclust')

Also, I'd be careful about mixing coordinates and geochemical data. Most of the clustering algorithms are based on the distance between samples or the density of the data. The location of the drill hole samples are generally clustered and get in the way of clustering on geochemistry.
Also, the size of the values is important. Have you scaled the data? It looks to me as though those clusters are done almost entirely on X,Y,Z. Easting and Northing values that are orders of magnitude higher than assay values will dominate the clustering. Try clustering just on XYZ and see if you get the same result.
All the best

Hi Geoff,

Thanks for sharing - this is an interesting topic and is certainly an nice way to explore your data to help find trends and relationships. This is also something we are busy doing R&D in Micromine right now. We hope to include some new features around this in a future release.