Lab 7: Clustering with Hierarchical Methods — From Toy Data to Music Genres
Exploring unsupervised learning through hierarchical clustering, dendrograms, and a music genre classification dataset.
Introduction
Unlike supervised learning, where models learn from labeled data, clustering belongs to the world of unsupervised learning. It groups data points based on similarity, revealing hidden patterns without prior labels.
In this lab, I explored hierarchical clustering, starting from simple toy datasets to understand the mechanics, and then applying it to a larger dataset of music genres. Along the way, I visualized results with dendrograms and scatter plots, and learned how choices in distance metrics and linkage methods affect the outcome.
Key Steps Covered
- Introduction to Clustering
- Difference between agglomerative (bottom-up) and divisive (top-down) clustering.
- Hierarchical Clustering on Toy Data
- Built dendrograms using SciPy’s
linkageanddendrogram. - Implemented Agglomerative Clustering with scikit-learn.
- Built dendrograms using SciPy’s
- Music Genre Dataset (Kaggle)
- Extracted features from a 30-second dataset spanning 10 genres.
- Normalized features with
MinMaxScaler. - Built dendrograms to visualize hierarchical structure.
- Clustered into 10 groups and visualized results.
Takeaway
Clustering revealed how unsupervised methods can uncover structure in data, from toy datasets to real-world applications like music genre classification. The visualizations with dendrograms made it clear how hierarchical clustering builds groups step by step, offering both interpretability and flexibility in exploring unlabeled data.
🔗 View the full Lab Notebook on GitHub
▶️ Run in Google Colab
