Home » All set? Let’s now interpret and simplify the K-Means results.

All set? Let’s now interpret and simplify the K-Means results.

The output of `ML.PREDICT` is complete, but sometimes we want something more direct.

The most common: Knowing only which group each data belongs to

In most cases, we’re most interested in simply knowing which cluster each of our elements (pages, users, keywords, etc.) belongs to. We want that `NEAREST_CENTROID_ID` tag so we can analyze performance by group, segment campaigns, and so on.

We can simplify the previous query to obtain a cleaner table, containing  oman phone number data our original data plus a column with the assigned cluster ID:l

The result of this query would be a table like this (fictional example for web pages):

This is much more manageable and will also save you costs, as you can now query only the data in that already calculated table without having to call the model each time. You can now perform a `GROUP BY cluster_id` to see the average metrics for each group, filter by cluster, etc.

When to use this?   Almost always. To segment, label, create new dimensions in your dashboards, analyze aggregate performance by group…

Do clusters always have to be consecutive numbers? Don’t they have a descriptive name?

Not only do they not have names, but the number (1, 2, 3) itself means nothing. They are just groupings. Once you have them, you should describe them so you can better understand them. How? By running a boosting sales with pipeline velocity  query that describes the average data for each group. This query will use a GROUP BY by cluster ID and return the averages for each clustered metric in each cluster. This way, you’ll have a more descriptive value.

We can run a query on the table already generated in the previous step like this:

This would define the average value of the metrics in each cluster. In other words, it would create a table like the following:

Let’s go through them one by one, trying to understand the type of data each contains, and get creative (or use generative AI, of course).

  • Cluster 1:  Hidden Gems / SEO Potential .
    Few landing pages, low GSC traffic, poor average ranking, but very good conversion rate and duration
    !
  • Cluster 2:  ” Problems/Needs Urgent Optimization .”
    Many landing pages, medium-low GSC traffic, poor ranking, decent GA4 sessions but low conversion and duration
    .
  • Cluster 3:  ” Stars .”
    Few landing pages, but they’re the top. Lots of GSC traffic, very good  gambler data ranking, many GA4 sessions, good conversion and duration
    .
  • Cluster 4:  ” Very Specific Niche / High Value .”
    Few landing pages, very little GSC traffic but excellent CTR and position, low GA4 sessions but the best conversion rate and duration
    !
  • Cluster 5:  « Highly Viewed Content / Low Conversion «
    Plenty of landings, lots of GSC traffic and GA4 sessions, but low CTR, low conversion and short duration

This analysis of averages is essential for interpreting the results of K-Means and deciding what actions to take for each discovered segment.

Scroll to Top