Lab 2: Evaluating Classification Models — Confusion Matrices, Errors, and ROC Curves

Learning how to measure model performance: from true/false positives to confusion matrices, error types, and ROC curves — with hands-on examples in Python.

Introduction

Building predictive models is only half the story — understanding how good they are is equally important. In this lab, I explored the fundamentals of model evaluation, starting with simple metrics like true positives and negatives, and moving to more nuanced concepts like confusion matrices and ROC curves.

Through real-world inspired examples, including a medical diagnosis case study, I discovered why accuracy alone is often misleading, and why metrics like precision, recall, and sensitivity matter.

Key Steps Covered

True/False Positives and Negatives
- Defined core evaluation terms.
Type I & II Errors
- False positives vs false negatives explained.
Confusion Matrix
- Implemented from scratch and with sklearn.metrics.
- Case study: evaluating different medical diagnostic classifiers.
Visualization
- Generated heatmaps of confusion matrices with Matplotlib & Seaborn.
ROC Curve
- Introduced the concept of trade-offs between sensitivity and specificity.
- Compared models using ROC analysis.

Takeaway

This lab emphasized that evaluation goes beyond raw accuracy. By learning how to interpret confusion matrices and ROC curves, I gained tools to select the right model for the right context — especially in high-stakes applications like medical testing, where false positives and negatives carry very different consequences.

🔗 View the full Lab Notebook on GitHub
▶️ Run in Google Colab

Written on August 22, 2025