Lab 1: Getting Started with Data Mining in Python — NumPy Foundations

Kickstarting my data mining journey with Python: setting up the environment, exploring NumPy arrays, and understanding the foundations of multidimensional data.


Introduction

This first lab in my data mining series introduces the essential Python environment setup and dives into one of the most powerful libraries for numerical computing — NumPy. Arrays are the backbone of data manipulation in Python, and mastering them early on is crucial for everything that follows, from machine learning to deep learning.

In this post, I’ll walk through creating arrays, examining their properties, and working with shapes, dimensions, and sizes — the building blocks of structured data analysis.


Key Steps Covered

  • Environment Setup: Installing key libraries like NumPy, Pandas, Matplotlib, and Scikit-learn.
  • Introduction to Arrays:
    • 1D arrays (vectors)
    • Column vectors
    • 2D arrays (matrices)
    • 3D arrays (tensors)
  • Array Properties: Understanding ndim, shape, and size.
  • Special Arrays: Creating zero and one matrices, useful in classification tasks.
  • Indexing & Accessing Elements: Retrieving specific values from arrays.

Takeaway

This lab set the foundation for future work in data mining by making me comfortable with arrays — a fundamental concept for data representation in Python. Arrays are the language of machine learning models, and knowing how to manipulate them is the first step toward meaningful analysis.


🔗 View the full Lab Notebook on GitHub
▶️ Run in Google Colab

Written on August 22, 2025