Lab 7: Clustering with Hierarchical Methods — From Toy Data to Music Genres

Exploring unsupervised learning through hierarchical clustering, dendrograms, and a music genre classification dataset.

Introduction

Unlike supervised learning, where models learn from labeled data, clustering belongs to the world of unsupervised learning. It groups data points based on similarity, revealing hidden patterns without prior labels.

In this lab, I explored hierarchical clustering, starting from simple toy datasets to understand the mechanics, and then applying it to a larger dataset of music genres. Along the way, I visualized results with dendrograms and scatter plots, and learned how choices in distance metrics and linkage methods affect the outcome.

Key Steps Covered

Introduction to Clustering
- Difference between agglomerative (bottom-up) and divisive (top-down) clustering.
Hierarchical Clustering on Toy Data
- Built dendrograms using SciPy’s linkage and dendrogram.
- Implemented Agglomerative Clustering with scikit-learn.
Music Genre Dataset (Kaggle)
- Extracted features from a 30-second dataset spanning 10 genres.
- Normalized features with MinMaxScaler.
- Built dendrograms to visualize hierarchical structure.
- Clustered into 10 groups and visualized results.

Takeaway

Clustering revealed how unsupervised methods can uncover structure in data, from toy datasets to real-world applications like music genre classification. The visualizations with dendrograms made it clear how hierarchical clustering builds groups step by step, offering both interpretability and flexibility in exploring unlabeled data.

🔗 View the full Lab Notebook on GitHub
▶️ Run in Google Colab

Written on August 22, 2025