{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1 style=\"text-align:center;\">Transformer using PyTorch: Sentiment Classification in Sentences of Text</h1>\n",
    "<p style=\"text-align:center;\">\n",
    "Nazar Khan\n",
    "<br>CVML Lab\n",
    "<br>University of The Punjab\n",
    "</p>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Introduction**\n",
    "Transformers are a type of deep learning architecture introduced in the seminal paper *\"Attention is All You Need\"* (Vaswani et al., 2017). They have revolutionized natural language processing (NLP) and are the foundation for models like GPT, BERT, and T5. Transformers use a mechanism called **self-attention** to weigh the importance of different tokens (words) in a sequence, making them effective for sequence-to-sequence tasks.\n",
    "\n",
    "Our focus will be on understanding the key concepts, step-by-step implementation, and practical applications.\n",
    "\n",
    "---\n",
    "\n",
    "### **Key Concepts**\n",
    "1. **Self-Attention**: Computes relationships between all tokens in a sequence.\n",
    "2. **Multi-Head Attention**: Improves the model's ability to focus on different parts of the sequence.\n",
    "3. **Positional Encoding**: Injects information about token positions since transformers are permutation-invariant.\n",
    "4. **Feed-Forward Networks**: Applies transformations independently to each token.\n",
    "5. **Layer Normalization**: Stabilizes training.\n",
    "6. **Residual Connections**: Helps in learning deep architectures by mitigating vanishing gradients.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Step-by-Step Implementation**\n",
    "\n",
    "#### **1. Import Libraries**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import math"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **2. Define the Scaled Dot-Product Attention**\n",
    "This is the core operation of self-attention."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def scaled_dot_product_attention(query, key, value, mask=None):\n",
    "    \"\"\"\n",
    "    Compute the attention weights and output.\n",
    "    \"\"\"\n",
    "    # Calculate scores\n",
    "    scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(query.size(-1))\n",
    "\n",
    "    # Apply mask (optional)\n",
    "    if mask is not None:\n",
    "        scores = scores.masked_fill(mask == 0, float('-inf'))\n",
    "\n",
    "    # Compute softmax to get attention weights\n",
    "    attention_weights = F.softmax(scores, dim=-1)\n",
    "\n",
    "    # Multiply weights by values\n",
    "    output = torch.matmul(attention_weights, value)\n",
    "    return output, attention_weights"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "---\n",
    "\n",
    "#### **3. Implement Multi-Head Attention**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MultiHeadAttention(nn.Module):\n",
    "    def __init__(self, embed_size, num_heads):\n",
    "        super(MultiHeadAttention, self).__init__()\n",
    "        assert embed_size % num_heads == 0, \"Embed size must be divisible by num_heads\"\n",
    "        self.head_dim = embed_size // num_heads\n",
    "        self.num_heads = num_heads\n",
    "\n",
    "        self.query = nn.Linear(embed_size, embed_size)\n",
    "        self.key = nn.Linear(embed_size, embed_size)\n",
    "        self.value = nn.Linear(embed_size, embed_size)\n",
    "        self.fc_out = nn.Linear(embed_size, embed_size)\n",
    "\n",
    "    def forward(self, x, mask=None):\n",
    "        batch_size = x.shape[0]\n",
    "        query = self.query(x)\n",
    "        key = self.key(x)\n",
    "        value = self.value(x)\n",
    "\n",
    "        # Reshape for multi-heads\n",
    "        query = query.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)\n",
    "        key = key.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)\n",
    "        value = value.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)\n",
    "\n",
    "        # Perform scaled dot-product attention\n",
    "        out, _ = scaled_dot_product_attention(query, key, value, mask)\n",
    "\n",
    "        # Concatenate heads\n",
    "        out = out.transpose(1, 2).contiguous().view(batch_size, -1, self.num_heads * self.head_dim)\n",
    "\n",
    "        return self.fc_out(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "---\n",
    "\n",
    "#### **4. Add Positional Encoding**\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class PositionalEncoding(nn.Module):\n",
    "    def __init__(self, embed_size, max_len=5000):\n",
    "        super(PositionalEncoding, self).__init__()\n",
    "        pe = torch.zeros(max_len, embed_size)\n",
    "        position = torch.arange(0, max_len).unsqueeze(1).float()\n",
    "        div_term = torch.exp(torch.arange(0, embed_size, 2).float() * -(math.log(10000.0) / embed_size))\n",
    "        pe[:, 0::2] = torch.sin(position * div_term)\n",
    "        pe[:, 1::2] = torch.cos(position * div_term)\n",
    "        self.pe = pe.unsqueeze(0)\n",
    "\n",
    "    def forward(self, x):\n",
    "        return x + self.pe[:, :x.size(1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **5. Define the Transformer Block**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TransformerBlock(nn.Module):\n",
    "    def __init__(self, embed_size, num_heads, ff_hidden_dim, dropout):\n",
    "        super(TransformerBlock, self).__init__()\n",
    "        self.attention = MultiHeadAttention(embed_size, num_heads)\n",
    "        self.norm1 = nn.LayerNorm(embed_size)\n",
    "        self.norm2 = nn.LayerNorm(embed_size)\n",
    "\n",
    "        self.feed_forward = nn.Sequential(\n",
    "            nn.Linear(embed_size, ff_hidden_dim),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(ff_hidden_dim, embed_size),\n",
    "        )\n",
    "        self.dropout = nn.Dropout(dropout)\n",
    "\n",
    "    def forward(self, x, mask=None):\n",
    "        attention = self.attention(x, mask)\n",
    "        x = self.norm1(x + attention)\n",
    "        forward = self.feed_forward(x)\n",
    "        x = self.norm2(x + self.dropout(forward))\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **6. Build a Transformer Encoder**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TransformerEncoder(nn.Module):\n",
    "    def __init__(self, embed_size, num_heads, ff_hidden_dim, num_layers, dropout, vocab_size, max_len):\n",
    "        super(TransformerEncoder, self).__init__()\n",
    "        self.embed_size = embed_size\n",
    "        self.word_embedding = nn.Embedding(vocab_size, embed_size)\n",
    "        self.position_embedding = PositionalEncoding(embed_size, max_len)\n",
    "        self.layers = nn.ModuleList(\n",
    "            [TransformerBlock(embed_size, num_heads, ff_hidden_dim, dropout) for _ in range(num_layers)]\n",
    "        )\n",
    "        self.dropout = nn.Dropout(dropout)\n",
    "\n",
    "    def forward(self, x, mask=None):\n",
    "        x = self.word_embedding(x)\n",
    "        x = self.position_embedding(x)\n",
    "        x = self.dropout(x)\n",
    "\n",
    "        for layer in self.layers:\n",
    "            x = layer(x, mask)\n",
    "        return x"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **7. Application: Sentiment Classification**\n",
    "After building the encoder, you can use it for tasks like text classification, machine translation, or summarization. Here's a **hands-on demonstration** for sentiment classification using the transformer-based model that we have built from scratch. The text dataset used is IMDB (available via `torchtext`)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "---\n",
    "\n",
    "### **Hands-On Demonstration**\n",
    "#### **Task**: Predict Sentiment on a Text Dataset\n",
    "- Use the IMDB dataset. You can download it from <a href=\"http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz\">here</a>.\n",
    "- Preprocess text to numerical tokens.\n",
    "- Train a transformer model for sentiment analysis.\n",
    "\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### **1. Import Required Libraries**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "import os\n",
    "from sklearn.model_selection import train_test_split\n",
    "from collections import Counter\n",
    "from itertools import chain\n",
    "import numpy as np\n",
    "import re"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "#### **2. Load and Preprocess IMDB Dataset**\n",
    "Assume the dataset is in the directory aclImdb/ with subdirectories train/pos, train/neg, test/pos, test/neg."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_imdb_data(base_path):\n",
    "    texts, labels = [], []\n",
    "    for label, sentiment in enumerate([\"neg\", \"pos\"]):\n",
    "        folder = os.path.join(base_path, sentiment)\n",
    "        for file_name in os.listdir(folder):\n",
    "            with open(os.path.join(folder, file_name), \"r\", encoding=\"utf-8\") as f:\n",
    "                texts.append(f.read().strip())\n",
    "                labels.append(label)\n",
    "    return texts, labels\n",
    "\n",
    "# Load training and testing data\n",
    "train_texts, train_labels = load_imdb_data(\"aclImdb/train\")\n",
    "test_texts, test_labels = load_imdb_data(\"aclImdb/test\")\n",
    "\n",
    "# Split training data into training and validation sets\n",
    "train_texts, val_texts, train_labels, val_labels = train_test_split(\n",
    "    train_texts, train_labels, test_size=0.2, random_state=42\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "#### **3. Tokenize and Build Vocabulary**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def clean_text(text):\n",
    "    # Remove HTML tags and special characters\n",
    "    return re.sub(r\"<.*?>\", \"\", text).lower()\n",
    "\n",
    "def tokenize(texts):\n",
    "    return [clean_text(text).split() for text in texts]\n",
    "\n",
    "train_tokens = tokenize(train_texts)\n",
    "val_tokens = tokenize(val_texts)\n",
    "test_tokens = tokenize(test_texts)\n",
    "\n",
    "# Build vocabulary\n",
    "token_counts = Counter(chain.from_iterable(train_tokens))\n",
    "vocab = {word: i + 1 for i, (word, _) in enumerate(token_counts.most_common(20000))}\n",
    "\n",
    "# Convert tokens to numerical sequences\n",
    "def tokens_to_ids(tokens, vocab):\n",
    "    return [[vocab.get(token, 0) for token in text] for text in tokens]\n",
    "\n",
    "train_sequences = tokens_to_ids(train_tokens, vocab)\n",
    "val_sequences = tokens_to_ids(val_tokens, vocab)\n",
    "test_sequences = tokens_to_ids(test_tokens, vocab)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **4. Create Data Loaders**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import DataLoader, Dataset\n",
    "\n",
    "class IMDBDataset(Dataset):\n",
    "    def __init__(self, sequences, labels, max_len=512):\n",
    "        self.sequences = sequences\n",
    "        self.labels = labels\n",
    "        self.max_len = max_len\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.sequences)\n",
    "\n",
    "    def __getitem__(self, idx):\n",
    "        seq = self.sequences[idx][:self.max_len]\n",
    "        padded_seq = torch.zeros(self.max_len, dtype=torch.long)\n",
    "        padded_seq[:len(seq)] = torch.tensor(seq)\n",
    "        label = self.labels[idx]\n",
    "        return padded_seq, torch.tensor(label)\n",
    "\n",
    "train_dataset = IMDBDataset(train_sequences, train_labels)\n",
    "val_dataset = IMDBDataset(val_sequences, val_labels)\n",
    "test_dataset = IMDBDataset(test_sequences, test_labels)\n",
    "\n",
    "train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n",
    "val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)\n",
    "test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **5. Define the Transformer-Based Model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TransformerClassifier(nn.Module):\n",
    "    def __init__(self, embed_size, num_heads, ff_hidden_dim, num_layers, dropout, vocab_size, max_len, num_classes=2):\n",
    "        super(TransformerClassifier, self).__init__()\n",
    "        self.encoder = TransformerEncoder(\n",
    "            embed_size, num_heads, ff_hidden_dim, num_layers, dropout, vocab_size, max_len\n",
    "        )\n",
    "        self.fc = nn.Linear(embed_size, num_classes)\n",
    "\n",
    "    def forward(self, x, mask=None):\n",
    "        encoded = self.encoder(x, mask)\n",
    "        # Take the mean across tokens\n",
    "        pooled = torch.mean(encoded, dim=1)\n",
    "        return self.fc(pooled)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **6. Training and Evaluation Functions**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train_model(model, train_loader, test_loader, num_epochs=5, lr=0.001):\n",
    "    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "    model = model.to(device)\n",
    "    criterion = nn.CrossEntropyLoss()\n",
    "    optimizer = optim.Adam(model.parameters(), lr=lr)\n",
    "\n",
    "    for epoch in range(num_epochs):\n",
    "        model.train()\n",
    "        total_loss = 0\n",
    "        for texts, labels in train_loader:\n",
    "            texts, labels = texts.to(device), labels.to(device)\n",
    "            outputs = model(texts)\n",
    "            loss = criterion(outputs, labels)\n",
    "            optimizer.zero_grad()\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "            total_loss += loss.item()\n",
    "\n",
    "        print(f\"Epoch {epoch+1}, Loss: {total_loss / len(train_loader)}\")\n",
    "\n",
    "        evaluate_model(model, test_loader)\n",
    "\n",
    "def evaluate_model(model, test_loader):\n",
    "    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "    model.eval()\n",
    "    correct, total = 0, 0\n",
    "    with torch.no_grad():\n",
    "        for texts, labels in test_loader:\n",
    "            texts, labels = texts.to(device), labels.to(device)\n",
    "            outputs = model(texts)\n",
    "            predictions = torch.argmax(outputs, dim=1)\n",
    "            correct += (predictions == labels).sum().item()\n",
    "            total += labels.size(0)\n",
    "\n",
    "    print(f\"Test Accuracy: {correct / total * 100:.2f}%\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **7. Initialize, Train and Evaluate the Model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1, Loss: 0.5563821067333221\n",
      "Test Accuracy: 79.79%\n",
      "Epoch 2, Loss: 0.39982056922912595\n",
      "Test Accuracy: 81.18%\n",
      "Epoch 3, Loss: 0.31622911648750307\n",
      "Test Accuracy: 84.70%\n",
      "Epoch 4, Loss: 0.2586984635293484\n",
      "Test Accuracy: 84.72%\n",
      "Epoch 5, Loss: 0.21333774099349975\n",
      "Test Accuracy: 84.52%\n"
     ]
    }
   ],
   "source": [
    "# Model Hyperparameters\n",
    "embed_size = 128\n",
    "num_heads = 4\n",
    "ff_hidden_dim = 256\n",
    "num_layers = 2\n",
    "dropout = 0.1\n",
    "vocab_size = len(vocab) + 1\n",
    "max_len = 512\n",
    "\n",
    "model = TransformerClassifier(embed_size, num_heads, ff_hidden_dim, num_layers, dropout, vocab_size, max_len)\n",
    "\n",
    "# Train the Model\n",
    "train_model(model, train_loader, test_loader, num_epochs=2, lr=0.001)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **8. Saving and Loading a Trained Model**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model saved to sentiment_model.pth\n",
      "Model loaded from sentiment_model.pth\n",
      "tensor([1])\n",
      "Prediction: Positive\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_11549/3363049353.py:8: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\n",
      "  model.load_state_dict(torch.load(file_path, map_location=device))\n"
     ]
    }
   ],
   "source": [
    "# Function to save the trained model\n",
    "def save_model(model, file_path):\n",
    "    torch.save(model.state_dict(), file_path)\n",
    "    print(f\"Model saved to {file_path}\")\n",
    "\n",
    "# Function to load the trained model\n",
    "def load_model(model, file_path, device='cpu'):\n",
    "    model.load_state_dict(torch.load(file_path, map_location=device))\n",
    "    model.to(device)\n",
    "    print(f\"Model loaded from {file_path}\")\n",
    "    return model\n",
    "\n",
    "# Save the trained model\n",
    "save_model(model, \"sentiment_model.pth\")\n",
    "\n",
    "# Create a new instance of the model and load the saved weights\n",
    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "new_model = TransformerClassifier(embed_size, num_heads, ff_hidden_dim, num_layers, dropout, vocab_size, max_len)\n",
    "new_model = load_model(new_model, \"sentiment_model.pth\", device)\n",
    "\n",
    "# Verify the model works as expected\n",
    "new_model.eval()\n",
    "sample_input = torch.tensor([[vocab[word] for word in [\"this\", \"movie\", \"was\", \"amazing\"]]], device=device)\n",
    "output = new_model(sample_input)\n",
    "prediction = torch.argmax(output, dim=1)\n",
    "print(f\"Prediction: {'Positive' if prediction == 1 else 'Negative'}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "#### **9. Display Random Test Samples, Predictions, and Ground Truth**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sample 1:\n",
      "Input: actually there was nothing funny about this monstrosity at this movie was a complete the in this movie almost made me want to i think that the people responsible for this movie took advantage of their viewing audience. they took a relatively decent series of movies (i did say decent, not and totally trashed it by trying to put money in their the making of was a way for hollywood to make up for this crappy flick. the worst part about it is that either nobody in 1979 realized the asinine events of the movie (such as door popping off at some high or shooting a flair gun out the window at 2 to avoid a nuclear were they totally unrealistic or they just didn't i think that it is the latter of the two. the writers and director of this if you want to call it that, really tried to suck the airport dry with this crap!\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 2:\n",
      "Input: ok, i'm italian but there aren't so many italian film like this. i think that the plot is very good for 3/4 of the film but the final is too simple, too predictable. but it's the only little mistake. the consequences of love in my opinion have great sequences in particular at the beginning and great soundtrack. i'd like very much the lighting work on it. the best thing on it is a great, great actor. you know, if your name were al pacino now everybody would have still been talking about this performance. but it's only a great theater italian actor called toni yes, someone tell me this film and this kind of performance it's too slow, it's so boring, so many but i think that this components its fantastic, its the right way for describing the love story between a very talented young girl, the of the italian actress anna olivia and the old mysterious man one of my favorite italian films.\n",
      "Prediction: Positive\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 3:\n",
      "Input: this is one of a very few movies with terrific acting, wonderful story line and worthwhile to watch. it is about changes in life. it is about how happiness can be found when one is true to it is about accepting oneself and others. it is about life it is about coming to term with reality. it is about it is about love. both actresses are very true to their role. the actresses had very good chemistry between one another in the movie. this is the key of the movie that made the movie. it is rare to see an independent film like this one. one could tell the hard work that the film crew had to have while producing this movie. i wish to see more movie from this see the movie, you will love it.\n",
      "Prediction: Positive\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 4:\n",
      "Input: the quick and the undead is, finally, the first movie to actually render its own storyline and it is, essentially, one gigantic plot from that, the acting was quite bad, character motivations nonexistent or unbelievable and there wasn't a single character worth hanging our hat on. the most interesting cast member (who had great potential to be a dark horse got halfway through the the quick and the undead does serve as is an excellent example of how to do good it looked excellent, when you take into account budget it plays out like a guy got his hands on a hundred grand and watched a few westerns (most notably the good, the bad and the and then just threw a bunch of elements into a movie... \"you know, they have movies where characters do this! does it fit here? no, but who they do it in other movies so i should do it maybe a good view for burgeoning and otherwise, a\n",
      "Prediction: Positive\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 5:\n",
      "Input: i loved this movie so much. i'm a big fan of amanda recently ended show. i admire her for her acting she is a good movie was great. its about a girl named who wants to play but when her school cuts the girls soccer team she gets her brother is set to go to a prestigious school and he decides to leave to england. so wants to make an impression by playing on the soccer team at the boarding school. she goes to the school and tries out for the soccer team. she gets in. meanwhile she meets duke who is a sensitive guy who plays on the soccer team. he really likes olivia who likes is really sebastian is dating and suspects that sebastian isn't being is certainly not a chick flick and i enjoyed it a lot. its so funny and i don't think i have seen amanda act better.\n",
      "Prediction: Positive\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 6:\n",
      "Input: there is nothing good to say about this movie. read revolution for the hell of it or any of his other was often dismissed as the of the but he was a man of ideas who used his his sense of humor and pop culture, and his flamboyant personality to get attention to his ideas. the media too often concentrated on the man, not the ideas, and that's the problem with this movie, too. later in his life he did suffer from depression. but this flick is like a national version of he deserves better. if you don't know or his times, this movie won't help. this film i give it a zero.\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 7:\n",
      "Input: first of all, nothing will ever compare to the original movie, but for they're not trying to. it is just one persons opinion about what could have happened after left scarlett at i for one thought it was a terrific movie and would like to add it to my collection. the scenery alone would make me want to watch the movie. just view this movie as an extension of the original and don't think they are trying to replace vivian leigh and clark cable and you will enjoy it a lot. they really captured the spoiled of scarlett in many of the scenes and you can see from the longing in the looks from that he is clearly still in love with the fact that you can recognize many of the actors in the movie is another plus even though some of them have only been seen on tv. i always wanted them to have other children after bonnie blue died in the movie and this satisfied my need perfectly.\n",
      "Prediction: Negative\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 8:\n",
      "Input: years ago, i used to watch bad movies somehow i missed this one. no gesture rings true. no facial expression fits the scene or the action. i've never heard such inappropriate music for a film. at the final scene, i was rooting for the car to run over that ridiculous kid - one of the worst child actors one name in it i ever heard of - he must've been very hungry to take this not under any circumstances, watch this movie!!! you've been\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 9:\n",
      "Input: i saw this at the 2004 film festival in ny and it was very received. in this film, a pair of german rocket scientists are working on the scottish of as war on the the characters encountered on the island are priceless in their creation and their portrayal. macdonald is particularly getting up to speed on the accents, the viewer feel right at home with these folk who watch with amusement as the germans work to link their with the via a mail delivery. as implausible as it seems, this film was based on an actual in all, a memorable film that will stay with you for some time thanks to its casting, its story or its scenery.\n",
      "Prediction: Positive\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 10:\n",
      "Input: i saw two hands back in sydney a few years ago and it instantly became one of my all-time favourite films. it's got action, adventure, comedy and romance all rolled up into one (and a bit of zen thrown in for good like much australian film, the plot is easy to follow yet wonderfully engaging, and jordan should feel proud of his it was on tv just now on channel 4 in london, and my two favourite comedy scenes of not just this movie, but indeed any movie, had been cut out! so if you watch this movie, make sure it's the original version.\n",
      "Prediction: Positive\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 11:\n",
      "Input: i give it a 2 - i a 1 rating for guy ritchie and woody allen films. we don't even remember what this movie was about. the only thing we recall is one gunshot scene where the actors drop to the ground, roll to the other side of a or something and then get back up shooting. it was like watching with 2 broken legs trying to perform the also, when the characters were driving in a truck, the engine noise (or can't would entirely when the actors were like others, we bought it because of the sandra bullock front cover. very sad, very bad.\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 12:\n",
      "Input: 35 years after this was made, castro still unfortunately, we're left scratching our heads wondering how the maniac played by jack palance made it as far as i stumbled back across this recently, and was amused at noticing the incomparable sid and \"b\" movie favorite paul among the was obviously well past his prime when he directed this some of the lines are classic in a he really say that kind of the other thing i just noticed is that the score and the sound (not the are actually excellent -- the only first-rate elements of the entire production. so, don't watch this to learn anything about history or acting, but if you feel like watching this as a bring the beers and have some fun.\n",
      "Prediction: Positive\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 13:\n",
      "Input: this was alright. it was one of those we but we don't have enough evidence yet in the couple of movies i've seen her in, i've never really though much of stephanie a professional tv actress she is but nothing really outstanding. here in this she was definitely above average as the former fed (or was it fed on her character got along well with the motley bunch of special investigation unit cops she was assigned with. there wasn't really a goofy character you'd roll your eyes at and just despise which was good. also good is it takes awhile to know who the murderer is... but when i found out i wasn't that surprised. oh well. one more thing that was good was the los angeles locations. quite possibly if this was made today they'd use toronto or but here they really shot in downtown l.a. like that a lot (even though i liked the movie, too. i don't know if i'd ever watch it again but it wasn't too bad. my grade: b-\n",
      "Prediction: Negative\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 14:\n",
      "Input: a widely unknown strange little western with colours (probably the same material as it was used in \"johnny i guess or something, which makes blood look like shining nearly surrealistic scenes with twisted action and characters. something different, far from being a masterpiece, but there should be paid more attention to this little gem in western\n",
      "Prediction: Positive\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 15:\n",
      "Input: oh, well i thought it should be a good action, but it was not. although jeff stars there is nothing to two fight for almost hours, lot of talking and everything is so artificial that you could not believe it. the plot is clear from the beginning. if you want good action don't rent this movie.\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 16:\n",
      "Input: this movie is not very bad but one cannot find anything new about the personality of marquis de sade from this movie. the movie tries to stay on the borderline between erotic and insightful and it cannot succeed at either. the cinematography is really bad video\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 17:\n",
      "Input: let me say this about edward d. wood jr. he had a passion for his work that i wish more people did have. if we all had the optimism and the commanding hope of ed wood, the world would probably be a much better place. being familiar with ed wood's story and having seen the most wonderful biopic several times, i admire his and his strives for the job he i still admire his attitude. he had a love for directing that i wish more people in modern-day hollywood that doesn't make his movies any more fun to watch. and or his first and most film, is probably his very or is a cult movie about a named glen (played by director/writer ed wood himself) who despite his love for his fiancée barbara cannot seem to conquer his lust for in which he dresses in women's clothing and a wig and thus story is narrated by a doctor and he too is talked and watched over by a mysterious character called \"the played by veteran horror star bela oh, and there's also some about an character who becomes a based on the christine story, upon whom this movie originally titled \"i changed my was previously to be i dropped your jaw yet? well, as much as i want to warn you off this picture if you've never seen it, i would never tell a lie about a movie and there is not one word of in that plot synopsis i just gave you. every thing in it is true. this is a movie about and a topic that does not sound very appealing to begin with and is not done in a very appealing manner. i'm sure that with a good screenplay, and a good director (it had that or despite the subject matter, could have been a very moving picture. it is a movie on wood's part, as he was a transvestite in real life as well as on screen. but once again, that does not make it a good a watchable one for that matter. or is a mess of a movie that sinks into new in the realm of bad cinema. it makes no more sense than does its notoriously silly scene where bela lugosi screams the over inexplicable footage of the majority of the movie is narrated in a monotonous voice, reminding me of some very bad short informative films i've seen before. it's like one of those really bad short films expanded into a feature and twice as dull. we sit there for ages waiting for the plot that never there is no real attempt to even build energy with the camera being locked down in one position for many minutes and long stretches of time where nothing at all\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 18:\n",
      "Input: i got this movie in a bargain hoping for an amusingly bad flick. boy was i disappointed. (except for you see, the movie is indeed horrible, but so horrible, it isn't even laughable. the plot, oh wait, there is no plot. i suppose you could say it's about the main character rising up in the ranks of street fighting. at the end of the movie, the directors decided to either not make any more sense, or, more died and had a monkey finish directing the movie. don't read if you don't want the ending although the ending doesn't really spoil anything. the main character somehow ends up in a room filled with mirrors, a la enter the dragon, and then gets real angry, has stupid flashback, and hits a mirror. the end. only redeeming factor of this movie was scene. he's talking to the rival street fighting boss and says something along these lines, completely \"do not worry about him anymore i have killed him in a sophisticated manner. i him, i we went to a we was a lot of fun. and then i killed at which point the boss says \"good work you're number and says the scene continues with continuing to say over and over. the next scene is of a dead floating in a pool. i think i own the version of this movie, meaning the title on the box i own is it shows a huge guy holding a giant gun and this never happens in the movie. this man is never in the movie. high am new to this but hell yes i am going to keep it up.\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 19:\n",
      "Input: this was a modern tv classic! the story goes like this, bob has a girlfriend named bob and alicia are in love and want to get married. bob has a bud named owen who he works with. owen is jealous of he likes hanging around with now owen hangs around with bob and bob and owen have a secretary named heather is very accident she is also, if you haven't guessed it, is kind of she too hangs around with bob and and sometimes alicia wishes bob didn't have any the end of the first season, it looked like owen finally found himself a real girlfriend. bob and alicia went driving on the night before their wedding, making out in a tiny car and then getting way out in the middle of what happened next? what about poor did anyone get sorry, series even a from tv guide made fox think twice about putting it back on the air. it was a great show with a great cast. i loved heather too. she looked cool in those glasses and was hilarious. i miss this show a lot. this is like reading a good book with the ending\n",
      "Prediction: Positive\n",
      "Ground Truth: Positive\n",
      "--------------------------------------------------------------------------------\n",
      "Sample 20:\n",
      "Input: was inspired by events on the track during ww2 when australian and ultimately stopped a push by japanese soldiers to move and capture port what they really mean is that the movie is set in this time period but is fiction and everything that happens is just a of standard scenes from other war films. the first hour is just one cliché after another. some of the scenes are simply there to be able to draw us into a feeling that this conflict was horrific beyond when there appears to be little evidence of this. both sides fought hard to control the track and no mercy was shown by either side. both sides suffered from and the terrain was a great in this conflict. as the japanese got closer to port their supply line grew and this ultimately led to their on the other hand as the australians closer to port their supply line some of the scenes appear to be straight out of the on standard scenes to include in any war film. the film was misguided and highlighted the youth of the production team. at a time when australia could have done with a great film about one of best moments the film is a shallow disappointment.\n",
      "Prediction: Negative\n",
      "Ground Truth: Negative\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "\n",
    "def display_random_predictions(model, test_dataset, vocab, num_samples=20):\n",
    "    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "    model = model.to(device)\n",
    "    model.eval()\n",
    "\n",
    "    # Select random indices from the test dataset\n",
    "    random_indices = random.sample(range(len(test_dataset)), num_samples)\n",
    "\n",
    "    # Extract samples and ground truth labels\n",
    "    selected_samples = [test_dataset[i] for i in random_indices]\n",
    "    sequences = torch.stack([sample[0] for sample in selected_samples]).to(device)\n",
    "    ground_truths = [sample[1].item() for sample in selected_samples]\n",
    "\n",
    "    # Convert numerical sequences back to text\n",
    "    idx_to_word = {idx: word for word, idx in vocab.items()}\n",
    "    input_texts = []\n",
    "    for sample in selected_samples:\n",
    "        words = [idx_to_word[idx.item()] for idx in sample[0] if idx.item() in idx_to_word]\n",
    "        input_texts.append(\" \".join(words))\n",
    "\n",
    "    # Pass selected samples through the model\n",
    "    with torch.no_grad():\n",
    "        outputs = model(sequences)\n",
    "        predictions = torch.argmax(outputs, dim=1).tolist()\n",
    "\n",
    "    # Display results\n",
    "    for i in range(num_samples):\n",
    "        print(f\"Sample {i+1}:\")\n",
    "        print(f\"Input: {input_texts[i]}\")\n",
    "        print(f\"Prediction: {'Positive' if predictions[i] == 1 else 'Negative'}\")\n",
    "        print(f\"Ground Truth: {'Positive' if ground_truths[i] == 1 else 'Negative'}\")\n",
    "        print(\"-\" * 80)\n",
    "\n",
    "# Call the function\n",
    "display_random_predictions(new_model, test_dataset, vocab, num_samples=20)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### **Outcome**\n",
    "- We have trained a transformer-based sentiment classifier on the IMDB dataset.\n",
    "- The `evaluate_model` function computes and displays the test accuracy after each epoch.\n",
    "\n",
    "This example provides an end-to-end pipeline for understanding and applying transformers to real-world tasks. You can extend this by experimenting with different datasets and configurations."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### **Conclusion**\n",
    "This tutorial introduces the core components of transformers. You can:\n",
    "1. Experiment with different architectures (e.g., decoder-based transformers like GPT).\n",
    "2. Apply transformers to real-world tasks using libraries like Hugging Face Transformers."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cvml",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}