<h1 style="text-align:center;">Machine Learning: Linear Regression</h1>
<p style="text-align:center;">
Nazar Khan
<br>CVML Lab
<br>University of The Punjab
</p>

This is a tutorial on linear regression using synthetic data from a sinusoidal curve. We will use three different types of regression models:
1. Linear Regression
2. Polynomial Regression, and
3. Ridge Regression.

Each regression model uses a different form of the loss function.

This Python notebook will use common libraries like numPy, matplotlib, and scikit-learn. It will demonstrate:

- Generating data with noise from a sinusoidal function.
- Linear regression on noisy data.
- Overfitting using higher-degree polynomial features.
- Generalization by increasing the amount of data.
- Regularization using Ridge regression to avoid overfitting.

### Step 1: Generate Sinusoidal Data

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error

# Set random seed for reproducibility
np.random.seed(42)

# Generate true sine wave
X_true = np.linspace(-np.pi, np.pi, 100).reshape(-1, 1)
y_true = np.sin(X_true)

# Add noise to the data for training
def generate_noisy_sinusoidal_data(n_points, noise_std):
    X = np.linspace(-np.pi, np.pi, n_points).reshape(-1, 1)
    y = np.sin(X) + np.random.normal(0, noise_std, size=X.shape)
    return X, y

# Training data (10 noisy points)
X_train, y_train = generate_noisy_sinusoidal_data(10, noise_std=0.2)

# Testing data (90 noisy points)
X_test, y_test = generate_noisy_sinusoidal_data(90, noise_std=0.2)

# Plot the true curve, training data, and testing data
plt.figure(figsize=(10, 6))
plt.plot(X_true, y_true, label='True Sine Wave', color='blue')
plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')
plt.scatter(X_test, y_test, color='green', label='Testing Data (90 points)', alpha=0.5)
plt.legend()
plt.title('Training and Testing Data with Noise')
plt.show()


### Step 2: Linear Regression and Overfitting Example

Linear regression tries to minimize the Mean Squared Error (MSE) between the predicted values $\hat{y} and the actual values $y$.

Loss Function:

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2$

where:

- $y_i$ are the true target values.
- $\hat{y}_i$ are the predicted values.
- $n$ is the number of data points.


In [None]:
# Perform linear regression (degree=1)
poly_features = PolynomialFeatures(degree=1)
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.fit_transform(X_test)
X_true_poly = poly_features.fit_transform(X_true)

linear_regressor = LinearRegression()
linear_regressor.fit(X_train_poly, y_train)

# Predict for both train and test data
y_train_pred = linear_regressor.predict(X_train_poly)
y_test_pred = linear_regressor.predict(X_test_poly)
y_true_pred = linear_regressor.predict(X_true_poly)

# Plot the true curve, training points, and linear fit
plt.figure(figsize=(10, 6))
plt.plot(X_true, y_true, label='True Sine Wave', color='blue')
plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')
plt.plot(X_true, y_true_pred, label='Linear Fit (Degree 1)', color='black')
plt.legend()
plt.title('Linear Regression (Degree 1)')
plt.show()

# Evaluate the error on both train and test sets
train_error = mean_squared_error(y_train, y_train_pred)
test_error = mean_squared_error(y_test, y_test_pred)
print(f"Linear Regression (Degree 1) - Training Error: {train_error:.4f}, Test Error: {test_error:.4f}")


### Step 3: Overfitting using Higher Degrees

Polynomial regression is essentially linear regression with polynomial features. The model still minimizes the MSE, but on a transformed feature space (polynomial basis).

Loss Function:

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2$
 
The difference is that $\hat{y}_i$ comes from a polynomial model (e.g., degree 9 in the example). The MSE is still calculated the same way, but the model is more complex.

In [None]:
# Try polynomial regression with a higher degree (degree=9)
poly_features = PolynomialFeatures(degree=9)
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.fit_transform(X_test)
X_true_poly = poly_features.fit_transform(X_true)

poly_regressor = LinearRegression()
poly_regressor.fit(X_train_poly, y_train)

# Predict for both train and test data
y_train_pred_poly = poly_regressor.predict(X_train_poly)
y_test_pred_poly = poly_regressor.predict(X_test_poly)
y_true_pred_poly = poly_regressor.predict(X_true_poly)

# Plot the true curve, training points, and high-degree polynomial fit
plt.figure(figsize=(10, 6))
plt.plot(X_true, y_true, label='True Sine Wave', color='blue')
plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')
plt.plot(X_true, y_true_pred_poly, label='Polynomial Fit (Degree 9)', color='black')
plt.legend()
plt.title('Overfitting: Polynomial Regression (Degree 9)')
plt.show()

# Evaluate the error on both train and test sets
train_error_poly = mean_squared_error(y_train, y_train_pred_poly)
test_error_poly = mean_squared_error(y_test, y_test_pred_poly)
print(f"Polynomial Regression (Degree 9) - Training Error: {train_error_poly:.4f}, Test Error: {test_error_poly:.4f}")
print("Learned parameters:\n", poly_regressor.coef_)
print("Magnitude of learned parameters vector: ", np.linalg.norm(poly_regressor.coef_))


### Step 4: Generalization by Using More Data

In [None]:
# Use both training and testing data (100 points) to prevent overfitting
X_all = np.concatenate([X_train, X_test])
y_all = np.concatenate([y_train, y_test])

poly_features = PolynomialFeatures(degree=9)
X_all_poly = poly_features.fit_transform(X_all)
X_true_poly = poly_features.fit_transform(X_true)

poly_regressor.fit(X_all_poly, y_all)

# Predict for the full dataset
y_true_pred_all = poly_regressor.predict(X_true_poly)

# Plot the true curve and new polynomial fit (using 100 points)
plt.figure(figsize=(10, 6))
plt.plot(X_true, y_true, label='True Sine Wave', color='blue')
plt.scatter(X_all, y_all, color='orange', label='All Data (100 points)')
plt.plot(X_true, y_true_pred_all, label='Polynomial Fit (Degree 9, All Data)', color='black')
plt.legend()
plt.title('Generalization with More Data: Polynomial Regression (Degree 9)')
plt.show()

# Evaluate the error on the larger dataset
train_error_all = mean_squared_error(y_all, poly_regressor.predict(X_all_poly))
print(f"Polynomial Regression (Degree 9, All Data) - Error: {train_error_all:.4f}")
print("Learned parameters:\n", poly_regressor.coef_)
print("Magnitude of learned parameters vector: ", np.linalg.norm(poly_regressor.coef_))

### Step 5: Generalization Using Ridge Regularization

Ridge regression modifies the linear (or polynomial) regression loss by adding a regularization term to penalize large weights. This helps prevent overfitting.

Loss Function:

$\text{Ridge Loss} = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2 + \alpha \sum_{j=1}^{p} w_j^2$
 
where:

- $\frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2$ is the MSE.
- $\sum_{j=1}^{p} w_j^2$ is the L2 regularization term.
- $\alpha$ is a regularization hyperparameter controlling the strength of the penalty.
- $w_j$ are the model parameters (weights).

The regularization term penalizes large values of weights, effectively discouraging overfitting by smoothing the model's parameters.


In [None]:
# Ridge regression with regularization (L2 penalty)
ridge_regressor = Ridge(alpha=1.0)
ridge_regressor.fit(X_train_poly, y_train)

# Predict for train, test, and true data
y_train_pred_ridge = ridge_regressor.predict(X_train_poly)
y_test_pred_ridge = ridge_regressor.predict(X_test_poly)
y_true_pred_ridge = ridge_regressor.predict(X_true_poly)

# Plot the true curve, training points, and Ridge regression fit
plt.figure(figsize=(10, 6))
plt.plot(X_true, y_true, label='True Sine Wave', color='blue')
plt.scatter(X_train, y_train, color='red', label='Training Data (10 points)')
plt.plot(X_true, y_true_pred_ridge, label='Ridge Fit (Degree 9)', color='black')
plt.legend()
plt.title('Ridge Regularization: Polynomial Regression (Degree 9)')
plt.show()

# Evaluate the error with Ridge regularization
train_error_ridge = mean_squared_error(y_train, y_train_pred_ridge)
test_error_ridge = mean_squared_error(y_test, y_test_pred_ridge)
print(f"Ridge Regression (Degree 9) - Training Error: {train_error_ridge:.4f}, Test Error: {test_error_ridge:.4f}")
print("Learned parameters:\n", ridge_regressor.coef_)
print("Magnitude of learned parameters vector: ", np.linalg.norm(ridge_regressor.coef_))


### Summary of Loss Functions
- Linear Regression: Minimizes the Mean Squared Error (MSE).
- Polynomial Regression: Same as linear regression but applied to polynomial features.
- Ridge Regression: Minimizes MSE with an additional L2 regularization term to prevent overfitting.