Unsupervised Learning Performance Metric

3 min readAug 3, 2022

Accuracy, TPR, FPR, FP, TP, Confusion Matrix, list goes on and on and on…. All these are for, supervised learning, where you have features and their corresponding labels.

But what do we use in the case of unsupervised learning, where we don’t have labels, that is ground truth to compare with ? That is what we’re gonna find out here. We are going to focus on performance mterics for clustering algorithms.

I’ll list down the types of metrics first, and explain each one meticulously.

They’re broadly classified into 3, Internal Validation, External Validation and Relative Validation.

External Validation

External validation is where we compare the cluster we obtained with the external information that we have about the cluster. In this case we use Recall, Precision which require true labels. Now the question may arise on if there are labels available how can we call them unsupervised learning.Well this particular method is for clustering when the labels are known.

And here arises the 2nd question, if the labels are known why not perform supervised learning, well this External Validation method is used while devising new clustering technique to determine it’s effectiveness.

Also external Validation may include comparing the cluster obtained by us with the cluster formed by subject matter experts manually. As this requires manual work, we mostly don’t use this is real world.

F-measure, Nmimeasure, Purity and Entropy are types of External Validation evaluating clustering algorithms when the labels of clusters are known.I’m not gonna dig deep into these methods right now (right now, to keep you guys hooked 😉). I’m going to focus on explaing the 3 methods first, maybe do a part 2 for explaning each of these separately.

Internal Validation

When it comes to internal validation all the metrics revolve around 2 important terms,

Cohesion within each cluster

Seperation between different cluster

Cohesion is the intra cluster similarity, this can be computed by summating the similarity between each pair of records contained in that cluster.

Seperation is the inter cluster difference, this can be computed by summating the distance between each pair of records falling within the two clusters and both the records are from different clusters.

Silhoutte Coefficient, Calisnki-Harabasz coefficient, Dunn Index, Xie-Beni score, Hartigan index are the different kind of Internal Validation evaluation methods.

Relative Validation

The last kind of method, which is rarely discussed is relative validation. As the name suggests we compare, the cluster we obtained with other clustering schemes. This method is also called as Twin Sampling.

For some reason I find this method to be my favourite, and really simple, that I’m gonna explain this one in deatil.Well, I know I’m being partial, duh everyone has favourites.

The whole process can be summarized into 4 simple steps.

Creating a twin-sample of training data
Performing unsupervised learning on twin-sample
Importing results for twin-sample from training set
Calculating similarity between two sets of results

Creating a twin-sample of training data

As the first step we are supposed to create a dataset, that is similar to our dataset, thie process is more like creating a validation set for our traning set. The validation set is supposed to come from the same distribution as the training set and should cover most of the patterns observed in traning set.

Performing unsupervised learning on twin-sample

Now that we have a validation set, perform clustering on this dataset using another clustering algorithm, but keep the prarameters like, number of clusters and type of distance measure same. We will get the clusters for the validation dataset.

Importing results for twin-sample from training set

Now, we can import the results we got when we performed clustering on the traning dataset, using the algorithm we selected. This is the clusters of the traning dataset

Calculating similarity between two sets of results

Now finally, after getting both the cluster sets, we compare. The more similar the two cluster sets are the more accurate our prediction is.

This method is as simple as getting a second opinion.

With this ends the Twin Sample Method and our evaluation metrics.

Unsupervised Learning Performance Metric

Written by Raghavi_bala