{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b2588024",
   "metadata": {},
   "source": [
    "<h1 style=\"text-align:center;\">Machine Learning: Linear Regression</h1>\n",
    "<p style=\"text-align:center;\">\n",
    "Nazar Khan\n",
    "<br>CVML Lab\n",
    "<br>University of The Punjab\n",
    "</p>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "37bdfa41",
   "metadata": {},
   "source": [
    "This is a tutorial on linear regression using synthetic data from a sinusoidal curve. We will use three different types of regression models:\n",
    "1. Linear Regression\n",
    "2. Polynomial Regression, and\n",
    "3. Ridge Regression.\n",
    "\n",
    "Each regression model uses a different form of the loss function.\n",
    "\n",
    "This Python notebook will use common libraries like numPy, matplotlib, and scikit-learn. It will demonstrate:\n",
    "\n",
    "- Generating data with noise from a sinusoidal function.\n",
    "- Linear regression on noisy data.\n",
    "- Overfitting using higher-degree polynomial features.\n",
    "- Generalization by increasing the amount of data.\n",
    "- Regularization using Ridge regression to avoid overfitting."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d8997a28",
   "metadata": {},
   "source": [
    "### Step 1: Generate Sinusoidal Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ef206aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.preprocessing import PolynomialFeatures\n",
    "from sklearn.linear_model import LinearRegression, Ridge\n",
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "# Set random seed for reproducibility\n",
    "np.random.seed(42)\n",
    "\n",
    "# Generate true sine wave\n",
    "X_true = np.linspace(-np.pi, np.pi, 100).reshape(-1, 1)\n",
    "y_true = np.sin(X_true)\n",
    "\n",
    "# Add noise to the data for training\n",
    "def generate_noisy_sinusoidal_data(n_points, noise_std):\n",
    "    X = np.linspace(-np.pi, np.pi, n_points).reshape(-1, 1)\n",
    "    y = np.sin(X) + np.random.normal(0, noise_std, size=X.shape)\n",
    "    return X, y\n",
    "\n",
    "# Training data (10 noisy points)\n",
    "X_train, y_train = generate_noisy_sinusoidal_data(10, noise_std=0.2)\n",
    "\n",
    "# Testing data (90 noisy points)\n",
    "X_test, y_test = generate_noisy_sinusoidal_data(90, noise_std=0.2)\n",
    "\n",
    "# Plot the true curve, training data, and testing data\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(X_true, y_true, label='True Sine Wave', color='blue')\n",
    "plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')\n",
    "plt.scatter(X_test, y_test, color='green', label='Testing Data (90 points)', alpha=0.5)\n",
    "plt.legend()\n",
    "plt.title('Training and Testing Data with Noise')\n",
    "plt.show()\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c77bf915",
   "metadata": {},
   "source": [
    "### Step 2: Linear Regression and Overfitting Example\n",
    "\n",
    "Linear regression tries to minimize the Mean Squared Error (MSE) between the predicted values $\\hat{y} and the actual values $y$.\n",
    "\n",
    "Loss Function:\n",
    "\n",
    "$\\text{MSE} = \\frac{1}{n} \\sum_{i=1}^{n} \\left( y_i - \\hat{y}_i \\right)^2$\n",
    "\n",
    "where:\n",
    "\n",
    "- $y_i$ are the true target values.\n",
    "- $\\hat{y}_i$ are the predicted values.\n",
    "- $n$ is the number of data points.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79b2bac8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Perform linear regression (degree=1)\n",
    "poly_features = PolynomialFeatures(degree=1)\n",
    "X_train_poly = poly_features.fit_transform(X_train)\n",
    "X_test_poly = poly_features.fit_transform(X_test)\n",
    "X_true_poly = poly_features.fit_transform(X_true)\n",
    "\n",
    "linear_regressor = LinearRegression()\n",
    "linear_regressor.fit(X_train_poly, y_train)\n",
    "\n",
    "# Predict for both train and test data\n",
    "y_train_pred = linear_regressor.predict(X_train_poly)\n",
    "y_test_pred = linear_regressor.predict(X_test_poly)\n",
    "y_true_pred = linear_regressor.predict(X_true_poly)\n",
    "\n",
    "# Plot the true curve, training points, and linear fit\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(X_true, y_true, label='True Sine Wave', color='blue')\n",
    "plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')\n",
    "plt.plot(X_true, y_true_pred, label='Linear Fit (Degree 1)', color='black')\n",
    "plt.legend()\n",
    "plt.title('Linear Regression (Degree 1)')\n",
    "plt.show()\n",
    "\n",
    "# Evaluate the error on both train and test sets\n",
    "train_error = mean_squared_error(y_train, y_train_pred)\n",
    "test_error = mean_squared_error(y_test, y_test_pred)\n",
    "print(f\"Linear Regression (Degree 1) - Training Error: {train_error:.4f}, Test Error: {test_error:.4f}\")\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a923f350",
   "metadata": {},
   "source": [
    "### Step 3: Overfitting using Higher Degrees\n",
    "\n",
    "Polynomial regression is essentially linear regression with polynomial features. The model still minimizes the MSE, but on a transformed feature space (polynomial basis).\n",
    "\n",
    "Loss Function:\n",
    "\n",
    "$\\text{MSE} = \\frac{1}{n} \\sum_{i=1}^{n} \\left( y_i - \\hat{y}_i \\right)^2$\n",
    " \n",
    "The difference is that $\\hat{y}_i$ comes from a polynomial model (e.g., degree 9 in the example). The MSE is still calculated the same way, but the model is more complex."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82a1c989",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Try polynomial regression with a higher degree (degree=9)\n",
    "poly_features = PolynomialFeatures(degree=9)\n",
    "X_train_poly = poly_features.fit_transform(X_train)\n",
    "X_test_poly = poly_features.fit_transform(X_test)\n",
    "X_true_poly = poly_features.fit_transform(X_true)\n",
    "\n",
    "poly_regressor = LinearRegression()\n",
    "poly_regressor.fit(X_train_poly, y_train)\n",
    "\n",
    "# Predict for both train and test data\n",
    "y_train_pred_poly = poly_regressor.predict(X_train_poly)\n",
    "y_test_pred_poly = poly_regressor.predict(X_test_poly)\n",
    "y_true_pred_poly = poly_regressor.predict(X_true_poly)\n",
    "\n",
    "# Plot the true curve, training points, and high-degree polynomial fit\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(X_true, y_true, label='True Sine Wave', color='blue')\n",
    "plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')\n",
    "plt.plot(X_true, y_true_pred_poly, label='Polynomial Fit (Degree 9)', color='black')\n",
    "plt.legend()\n",
    "plt.title('Overfitting: Polynomial Regression (Degree 9)')\n",
    "plt.show()\n",
    "\n",
    "# Evaluate the error on both train and test sets\n",
    "train_error_poly = mean_squared_error(y_train, y_train_pred_poly)\n",
    "test_error_poly = mean_squared_error(y_test, y_test_pred_poly)\n",
    "print(f\"Polynomial Regression (Degree 9) - Training Error: {train_error_poly:.4f}, Test Error: {test_error_poly:.4f}\")\n",
    "print(\"Learned parameters:\\n\", poly_regressor.coef_)\n",
    "print(\"Magnitude of learned parameters vector: \", np.linalg.norm(poly_regressor.coef_))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79d92e01",
   "metadata": {},
   "source": [
    "### Step 4: Generalization by Using More Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47c18330",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use both training and testing data (100 points) to prevent overfitting\n",
    "X_all = np.concatenate([X_train, X_test])\n",
    "y_all = np.concatenate([y_train, y_test])\n",
    "\n",
    "poly_features = PolynomialFeatures(degree=9)\n",
    "X_all_poly = poly_features.fit_transform(X_all)\n",
    "X_true_poly = poly_features.fit_transform(X_true)\n",
    "\n",
    "poly_regressor.fit(X_all_poly, y_all)\n",
    "\n",
    "# Predict for the full dataset\n",
    "y_true_pred_all = poly_regressor.predict(X_true_poly)\n",
    "\n",
    "# Plot the true curve and new polynomial fit (using 100 points)\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(X_true, y_true, label='True Sine Wave', color='blue')\n",
    "plt.scatter(X_all, y_all, color='orange', label='All Data (100 points)')\n",
    "plt.plot(X_true, y_true_pred_all, label='Polynomial Fit (Degree 9, All Data)', color='black')\n",
    "plt.legend()\n",
    "plt.title('Generalization with More Data: Polynomial Regression (Degree 9)')\n",
    "plt.show()\n",
    "\n",
    "# Evaluate the error on the larger dataset\n",
    "train_error_all = mean_squared_error(y_all, poly_regressor.predict(X_all_poly))\n",
    "print(f\"Polynomial Regression (Degree 9, All Data) - Error: {train_error_all:.4f}\")\n",
    "print(\"Learned parameters:\\n\", poly_regressor.coef_)\n",
    "print(\"Magnitude of learned parameters vector: \", np.linalg.norm(poly_regressor.coef_))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9217ac87",
   "metadata": {},
   "source": [
    "### Step 5: Generalization Using Ridge Regularization\n",
    "\n",
    "Ridge regression modifies the linear (or polynomial) regression loss by adding a regularization term to penalize large weights. This helps prevent overfitting.\n",
    "\n",
    "Loss Function:\n",
    "\n",
    "$\\text{Ridge Loss} = \\frac{1}{n} \\sum_{i=1}^{n} \\left( y_i - \\hat{y}_i \\right)^2 + \\alpha \\sum_{j=1}^{p} w_j^2$\n",
    " \n",
    "where:\n",
    "\n",
    "- $\\frac{1}{n} \\sum_{i=1}^{n} \\left( y_i - \\hat{y}_i \\right)^2$ is the MSE.\n",
    "- $\\sum_{j=1}^{p} w_j^2$ is the L2 regularization term.\n",
    "- $\\alpha$ is a regularization hyperparameter controlling the strength of the penalty.\n",
    "- $w_j$ are the model parameters (weights).\n",
    "\n",
    "The regularization term penalizes large values of weights, effectively discouraging overfitting by smoothing the model's parameters.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c6e06e18",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ridge regression with regularization (L2 penalty)\n",
    "ridge_regressor = Ridge(alpha=1.0)\n",
    "ridge_regressor.fit(X_train_poly, y_train)\n",
    "\n",
    "# Predict for train, test, and true data\n",
    "y_train_pred_ridge = ridge_regressor.predict(X_train_poly)\n",
    "y_test_pred_ridge = ridge_regressor.predict(X_test_poly)\n",
    "y_true_pred_ridge = ridge_regressor.predict(X_true_poly)\n",
    "\n",
    "# Plot the true curve, training points, and Ridge regression fit\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(X_true, y_true, label='True Sine Wave', color='blue')\n",
    "plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')\n",
    "plt.plot(X_true, y_true_pred_ridge, label='Ridge Fit (Degree 9)', color='black')\n",
    "plt.legend()\n",
    "plt.title('Ridge Regularization: Polynomial Regression (Degree 9)')\n",
    "plt.show()\n",
    "\n",
    "# Evaluate the error with Ridge regularization\n",
    "train_error_ridge = mean_squared_error(y_train, y_train_pred_ridge)\n",
    "test_error_ridge = mean_squared_error(y_test, y_test_pred_ridge)\n",
    "print(f\"Ridge Regression (Degree 9) - Training Error: {train_error_ridge:.4f}, Test Error: {test_error_ridge:.4f}\")\n",
    "print(\"Learned parameters:\\n\", ridge_regressor.coef_)\n",
    "print(\"Magnitude of learned parameters vector: \", np.linalg.norm(ridge_regressor.coef_))\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "47801f22",
   "metadata": {},
   "source": [
    "### Summary of Loss Functions\n",
    "- Linear Regression: Minimizes the Mean Squared Error (MSE).\n",
    "- Polynomial Regression: Same as linear regression but applied to polynomial features.\n",
    "- Ridge Regression: Minimizes MSE with an additional L2 regularization term to prevent overfitting."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b40c1cb",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dl_pt",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}