As much as I try to explain everything step by step in this post, the truth is that until you run your own tests in BigQuery, you won’t fully understand it. That’s why I’m going to leave you with two simple queries. With these, you can create a K-means model with a simple cut and paste and see the results it classifies fictitious data. Of course, for the sake of norway phone number data simplicity, these queries will be used with very little data and completely useless, but they’ll get you up and running.
Step 1: Query to create a dummy K-means model on your dataset
Copy and paste this query into your BigQuery after replacing the dataset and project.
When you run the query, you’ll have to insights for sales success at outbound wait a few seconds. Even with this small sample, it needs to gradually place the centroids in their proper place. When you’re done, you can see details of the execution to create the model.
From that moment on, you will see the “Models” folder appear in your BigQuery explorer, and within it, the model you just created:
This means you can now use your model in any query and start seeing which cluster each row of data ends up in.
Step 2: Query to use the model and view its data.
Now reopen the BigQuery query editor and run this query with ML.PREDICT to apply the already trained model.
When you throw it you will find a result like this:
This means that everything has gone well.
- CENTROID_ID: The Cluster where this data has been associated.
- Rest of the data: The NEAREST_CENTROID_DISTANCE array with the distance from that data to all the others.
–