{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "

Neural Networks using PyTorch: Recognition of Handwritten Digits

\n", "

\n", "Nazar Khan\n", "
CVML Lab\n", "
University of The Punjab\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyTorch provides a flexible and powerful deep learning framework with easy-to-use tools for building and training neural networks. We’ll walk through the following steps:\n", "\n", "1. Loading and preparing data (using `torchvision`).\n", "2. Defining the neural network architecture.\n", "3. Specifying the loss function and optimizer.\n", "4. Training the model.\n", "5. Evaluating the model.\n", "6. Visualizing performance.\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Step 0: Installing PyTorch\n", "First, make sure you have PyTorch and Torchvision installed. You can install it by running:\n", "\n", "```bash\n", "pip install torch torchvision\n", "```\n", "\n", "Install Torchmetrics and Seaborn packages as well\n", "\n", "```bash\n", "pip install seaborn torchmetrics\n", "```\n", "\n", "I work on Ubuntu. For me, `torch` and `torchvision` did not run correctly. There was an issue with the Math Kernel Library (MKL). I was able to solve it by making a new Python environment with Python version 3.12 and using OpenBLAS instead of MKL. I named this new environment `cvml`. I used the following commands:\n", "\n", "```bash\n", "conda create --name cvml python=3.12\n", "conda activate cvml\n", "conda install numpy matplotlib scikit-learn seaborn ipython ipykernel blas=*=openblas -c conda-forge\n", "pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu\n", "pip3 install torchmetrics\n", "```\n", "\n", "**
If you encounter any problems in installing and successfuly running `torch` or `torchvision`, please post on the Google Classroom immediately.
**" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Loading the Data\n", "In this tutorial, we’ll use the `MNIST` dataset, which consists of 28x28 grayscale images of handwritten digits (0-9).\n", "The `MNIST` dataset is available as part of the larger `EMNIST` dataset. Please read [https://biometrics.nist.gov/cs_links/EMNIST/Readme.txt](https://biometrics.nist.gov/cs_links/EMNIST/Readme.txt) carefully to understand the dataset. You **must always** understand the data that you're working with.\n", "\n", "`torchvision` provides utilities to load and preprocess datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "import torchvision\n", "import torchvision.transforms as transforms\n", "from torch.utils.data import DataLoader\n", "import matplotlib.pyplot as plt\n", "from torchmetrics import ConfusionMatrix\n", "import seaborn as sns\n", "\n", "# Set device to GPU if available\n", "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "\n", "# Define transformations for the dataset (normalize the images)\n", "transform = transforms.Compose([\n", " transforms.ToTensor(), # Convert to tensor\n", " transforms.Normalize((0.5,), (0.5,)) # Normalize to range [-1, 1]\n", "])\n", "\n", "# Download and load the EMNIST dataset\n", "train_dataset = torchvision.datasets.EMNIST(root='./data', split='mnist', train=True, download=True, transform=transform)\n", "test_dataset = torchvision.datasets.EMNIST(root='./data', split='mnist', train=False, download=True, transform=transform)\n", "\n", "# Data loaders\n", "train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", "test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", "\n", "# Visualize some training images\n", "def visualize_samples():\n", " examples = iter(train_loader)\n", " example_data, example_targets = next(examples) #.next()\n", "\n", " for i in range(6):\n", " plt.subplot(2, 3, i+1)\n", " plt.imshow(torch.transpose(example_data[i][0],1,0), cmap='gray')\n", " plt.title(f'Label: {example_targets[i].item()}')\n", " plt.axis('off')\n", " plt.show()\n", "\n", "visualize_samples()\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Defining the Neural Network Architecture\n", "Now, let’s define a simple feedforward neural network using `nn.Module`. In PyTorch, `nn.Module` is the base class for all neural network components, such as layers and models. It provides the foundational structure for building and organizing neural networks in PyTorch, handling parameter management, forward passes, and modularity.\n", "\n", "Our network will have an input layer, one hidden layer, and an output layer for classifying digits (0-9)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class SimpleNeuralNet(nn.Module):\n", " def __init__(self, input_size=28*28, hidden_size=128, num_classes=10):\n", " super(SimpleNeuralNet, self).__init__()\n", " self.fc1 = nn.Linear(input_size, hidden_size) # First fully connected layer\n", " self.relu = nn.ReLU() # Activation function\n", " self.fc2 = nn.Linear(hidden_size, num_classes) # Second fully connected layer (output)\n", "\n", " def forward(self, x):\n", " x = x.view(-1, 28*28) # Flatten the input\n", " x = self.fc1(x)\n", " x = self.relu(x)\n", " x = self.fc2(x)\n", " return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Specifying the Loss Function and Optimizer\n", "In PyTorch, you need to define a loss function and an optimizer for training. We’ll use **cross-entropy loss** for classification and **Adam** optimizer.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Instantiate the model, loss function, and optimizer\n", "model = SimpleNeuralNet().to(device)\n", "criterion = nn.CrossEntropyLoss() # Loss function for multi-class classification\n", "optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam optimizer\n", "print(\"Hypothesis\\n\", model)\n", "print(\"\\nObjective Function:\\n\", criterion)\n", "print(\"\\nOptimizer:\\n\", optimizer)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Training the Model\n", "We’ll now train the model by iterating through the training data, performing forward passes, computing the loss, and updating the weights using backpropagation. We’ll also calculate the validation loss and accuracy after each epoch." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Function to train the neural network\n", "def train_model(model, train_loader, criterion, optimizer, num_epochs=5):\n", " for epoch in range(num_epochs):\n", " model.train() # Set model to training mode\n", " running_loss = 0.0\n", " for batch_idx, (inputs, labels) in enumerate(train_loader):\n", " inputs, labels = inputs.to(device), labels.to(device)\n", " \n", " # Forward pass\n", " outputs = model(inputs)\n", " loss = criterion(outputs, labels)\n", " \n", " # Backward pass and optimization\n", " optimizer.zero_grad()\n", " loss.backward()\n", " optimizer.step()\n", " \n", " running_loss += loss.item()\n", " \n", " print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n", "\n", "# Train the model\n", "train_model(model, train_loader, criterion, optimizer, num_epochs=5)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5: Evaluating the Model\n", "After training, we can evaluate the model’s performance on the test dataset to see how well it generalizes to unseen data.\n", "\n", "We will compute the *accuracy* of the predicted labels.\n", "\n", "We will plot the *confusion matrix* as well. The entry at row $i$ and column $j$ of the confusion matrix shows how many times a sample from class $i$ was predicted as belonging to class $j$. Ideally, the confusion matrix should contain non-zero values only on the main diagonal.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def evaluate_model(model, test_loader):\n", " model.eval() # Set model to evaluation mode\n", " correct = 0\n", " total = 0\n", " pred = []\n", " actual = []\n", " with torch.no_grad(): # Disable gradient calculation for evaluation\n", " for inputs, labels in test_loader: # Pick a batch of test samples\n", " inputs, labels = inputs.to(device), labels.to(device)\n", " outputs = model(inputs) # Perform fprop on the whole batch\n", " _, predicted = torch.max(outputs.data, 1) # Compute predictions on the whole batch\n", " total += labels.size(0)\n", " correct += (predicted == labels).sum().item()\n", " pred.append(predicted)\n", " actual.append(labels)\n", "\n", " acc = 100 * correct / total\n", " print(f'Accuracy of the model on the test images: {acc:.2f}%')\n", " \n", " # Compute the confusion matrix\n", " num_classes = model.fc2.out_features\n", " confmat = ConfusionMatrix(task=\"multiclass\", num_classes=num_classes)\n", " conf_matrix = confmat(torch.cat(pred,dim=0), torch.cat(actual,dim=0))\n", "\n", " # Plotting the confusion matrix\n", " plt.figure(figsize=(8, 6))\n", " sns.heatmap(conf_matrix, annot=True, fmt=\"d\", cmap=\"Blues\", cbar=False,\n", " xticklabels=[f'{i}' for i in range(num_classes)],\n", " yticklabels=[f'{i}' for i in range(num_classes)])\n", " plt.xlabel(\"Predicted\")\n", " plt.ylabel(\"Ground Truth\")\n", " plt.title(\"Confusion Matrix\")\n", " plt.show()\n", "\n", "# Evaluate the model\n", "evaluate_model(model, test_loader)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "After looking at your confusion matrix, please answer the following questions.\n", "- How many times was 4 misclassified as 9?\n", "- How many times was 9 misclassified as 4?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6: Visualizing Results\n", "We can visualize some predictions made by the model on the test dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def visualize_predictions(model, test_loader):\n", " examples = iter(test_loader)\n", " example_data, example_targets = next(examples) #.next()\n", " \n", " with torch.no_grad():\n", " example_data = example_data.to(device)\n", " outputs = model(example_data)\n", " _, predicted = torch.max(outputs, 1)\n", " \n", " # Plot 6 test images along with their predicted and true labels\n", " for i in range(6):\n", " plt.subplot(2, 3, i+1)\n", " plt.imshow(torch.transpose(example_data[i][0],1,0).cpu().reshape(28, 28), cmap='gray')\n", " plt.title(f'Pred: {predicted[i].item()}, True: {example_targets[i].item()}')\n", " plt.axis('off')\n", " plt.show()\n", "\n", "# Visualize predictions\n", "visualize_predictions(model, test_loader)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n", "This tutorial provided a step-by-step guide to building a simple neural network in PyTorch:\n", "\n", "- Loading the `EMNIST` dataset using `torchvision`.\n", "- Defining a simple neural network using `nn.Module`.\n", "- Specifying a loss function and optimizer.\n", "- Training the model using backpropagation.\n", "- Evaluating the model’s accuracy on test data.\n", "- Analyzing classification results using the confusion matrix.\n", "- Visualizing some predictions made by the trained model.\n", "\n", "PyTorch makes it easy to experiment with different neural network architectures and modify training procedures. You can further explore by adding more layers, using different activation functions, or experimenting with different datasets." ] } ], "metadata": { "kernelspec": { "display_name": "cvml_env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }