{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1 style=\"text-align:center;\">Gaussian Density Estimation Using Maximum Likelihood</h1>\n",
    "<p style=\"text-align:center;\">\n",
    "Nazar Khan\n",
    "<br>CVML Lab\n",
    "<br>University of The Punjab\n",
    "</p>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a Python tutorial on **Gaussian density estimation** that demonstrates the equivalence of maximizing likelihood with minimizing mean squared error (MSE). We solve a simple density estimation problem using maximum likelihood estimation (MLE).\n",
    "\n",
    "---\n",
    "\n",
    "#### **Objective:**\n",
    "**Maximum Likelihood Estimation (MLE):** We will show that maximizing the likelihood is equivalent to minimizing the mean squared error (MSE).\n",
    "\n",
    "We will work with a simple synthetic dataset generated from a normal distribution, and then both MLE for density estimation.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Step 1: Import Libraries**\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from scipy.stats import norm\n",
    "\n",
    "# For visualization\n",
    "import seaborn as sns\n",
    "sns.set(style=\"whitegrid\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### **Step 2: Generate Synthetic Data**\n",
    "\n",
    "Let's generate some synthetic data from a normal distribution, which will be our target density to estimate.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Generating synthetic data\n",
    "#np.random.seed(42)\n",
    "data = np.random.normal(loc=0, scale=1, size=10)\n",
    "\n",
    "# Visualizing the data with a histogram\n",
    "plt.figure(figsize=(8, 4))\n",
    "sns.histplot(data, bins=20, kde=True, color=\"blue\", stat=\"density\")\n",
    "plt.title(\"Histogram of the Synthetic Data\")\n",
    "plt.xlabel(\"Data Values\")\n",
    "plt.ylabel(\"Density\")\n",
    "plt.show()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### **Step 3: Maximum Likelihood Estimation (MLE)**\n",
    "\n",
    "In MLE, we assume the data is generated from a normal distribution, and we want to estimate the parameters (mean $\\mu$ and variance $\\sigma^2$) by maximizing the likelihood function. \n",
    "\n",
    "The likelihood function is defined as:\n",
    "\n",
    "$\n",
    "L(\\mu, \\sigma^2) = \\prod_{i=1}^{N} \\frac{1}{\\sqrt{2\\pi\\sigma^2}} \\exp\\left(-\\frac{(x_i - \\mu)^2}{2\\sigma^2}\\right)\n",
    "$\n",
    "\n",
    "Maximizing the log-likelihood:\n",
    "\n",
    "$\n",
    "\\log L(\\mu, \\sigma^2) = -\\frac{N}{2} \\log(2\\pi) - \\frac{N}{2} \\log(\\sigma^2) - \\frac{1}{2\\sigma^2} \\sum_{i=1}^{N} (x_i - \\mu)^2\n",
    "$\n",
    "\n",
    "The maximum likelihood estimates (MLE) for the mean and variance are:\n",
    "\n",
    "$\n",
    "\\hat{\\mu}_{MLE} = \\frac{1}{N} \\sum_{i=1}^{N} x_i\n",
    "$\n",
    "\n",
    "$\n",
    "\\hat{\\sigma}^2_{MLE} = \\frac{1}{N} \\sum_{i=1}^{N} (x_i - \\hat{\\mu}_{MLE})^2\n",
    "$\n",
    "\n",
    "Maximizing the log-likelihood is equivalent to minimizing the **mean squared error (MSE)**:\n",
    "\n",
    "$\n",
    "\\text{MSE} = \\frac{1}{N} \\sum_{i=1}^{N} (x_i - \\mu)^2\n",
    "$\n",
    "\n",
    "#### **MLE Solution:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# MLE estimation for the mean and variance\n",
    "mu_mle = np.mean(data)\n",
    "sigma_mle = np.var(data)\n",
    "\n",
    "# Displaying the results\n",
    "print(f\"MLE Estimate for Mean: {mu_mle}\")\n",
    "print(f\"MLE Estimate for Variance: {sigma_mle}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "### **Step 5: Visualize the Results**\n",
    "\n",
    "Let's compare the density estimates from both MLE and MAP with the true distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generating the density estimates\n",
    "x_values = np.linspace(-3, 3, 100)\n",
    "mle_density = norm.pdf(x_values, loc=mu_mle, scale=np.sqrt(sigma_mle))\n",
    "true_density = norm.pdf(x_values, loc=0, scale=1)  # True distribution\n",
    "\n",
    "# Plotting the densities\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(x_values, true_density, label=\"True Density\", color=\"black\", linestyle=\"--\")\n",
    "plt.plot(x_values, mle_density, label=\"MLE Estimate\", color=\"blue\")\n",
    "plt.title(\"Density Estimation: MLE vs MAP\")\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"Density\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "---\n",
    "\n",
    "### **Step 6: Conclusion**\n",
    "\n",
    "- **MLE**: Maximizes the likelihood, equivalent to minimizing MSE. The resulting estimate fits the data well but may overfit when data is sparse."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cvml_env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.0"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}