Lab 2: Evaluating Classification Models — Confusion Matrices, Errors, and ROC Curves

Learning how to measure model performance: from true/false positives to confusion matrices, error types, and ROC curves — with hands-on examples in Python.


Introduction

Building predictive models is only half the story — understanding how good they are is equally important. In this lab, I explored the fundamentals of model evaluation, starting with simple metrics like true positives and negatives, and moving to more nuanced concepts like confusion matrices and ROC curves.

Through real-world inspired examples, including a medical diagnosis case study, I discovered why accuracy alone is often misleading, and why metrics like precision, recall, and sensitivity matter.


Key Steps Covered

  • True/False Positives and Negatives
    • Defined core evaluation terms.
  • Type I & II Errors
    • False positives vs false negatives explained.
  • Confusion Matrix
    • Implemented from scratch and with sklearn.metrics.
    • Case study: evaluating different medical diagnostic classifiers.
  • Visualization
    • Generated heatmaps of confusion matrices with Matplotlib & Seaborn.
  • ROC Curve
    • Introduced the concept of trade-offs between sensitivity and specificity.
    • Compared models using ROC analysis.

Takeaway

This lab emphasized that evaluation goes beyond raw accuracy. By learning how to interpret confusion matrices and ROC curves, I gained tools to select the right model for the right context — especially in high-stakes applications like medical testing, where false positives and negatives carry very different consequences.


🔗 View the full Lab Notebook on GitHub
▶️ Run in Google Colab

Written on August 22, 2025