W3: P2: Optimal Clustering: Master the Art of Choosing the Perfect Number of Clusters!

Опубликовано: 27 Октябрь 2025
на канале: Computing For All
885
33

#Silhouette #SilhouetteScore #silhouetteCoefficient #Clustering #ClusterAssessment #ClusteringEvaluation
Today, we will discuss a way to find the optimal number of clusters, especially when we do not have any benchmark data. The question here is -- Given a dataset and no supervision, how can we figure out what number of clusters, k, is giving us the best results?

We applied the k-means clustering algorithm to the Pecan dataset. Using a score called silhouette coefficient, we evaluated the clustering result to find the optimal number of clusters.

There are many other mechanisms to evaluate clusters. Please note that whatever evaluation metrics you use, it is always better to look at data points from different clusters to check why the data points are in different clusters.

Clustering helps us get an initial idea about a dataset. Many times, clustering is used for exploratory data analysis.

00:01 Problem description
00:55 Start of Python Coding
01:21 Dataset description
01:37 Reading from the data file using Python
02:09 DataFrame analysis
03:52 Removing a column from the DataFrame
05:47 K-means clustering using sklearn
07:53 How good the clustering result is: Silhouette coefficient or score
09:08 Choosing the right number of clusters by investigating Silhouette score
12:55 Writing data and results to a new CSV file
16:17 What did we observe?

Please find the code and the Pecan dataset from the following link.
https://computing4all.com/data-scienc...

The following link contains the description of Silhouette score or coefficient along with many other clustering evaluation techniques:
https://computing4all.com/courses/int...

Videos that contain detailed theories on clustering are as follows:
Clustering -- An Introduction:    • 3.1 DS: Clustering -- An  Introduction  
k-means Clustering Algorithm in Detail:    • 3.2 DS: k-means Clustering Algorithm in De...  

Thank you!

Dr. Shahriar Hossain
https://computing4all.com