{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "

Gaussian Density Estimation Using Maximum Likelihood

\n", "

\n", "Nazar Khan\n", "
CVML Lab\n", "
University of The Punjab\n", "

" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is a Python tutorial on **Gaussian density estimation** that demonstrates the equivalence of maximizing likelihood with minimizing mean squared error (MSE). We solve a simple density estimation problem using maximum likelihood estimation (MLE).\n", "\n", "---\n", "\n", "#### **Objective:**\n", "**Maximum Likelihood Estimation (MLE):** We will show that maximizing the likelihood is equivalent to minimizing the mean squared error (MSE).\n", "\n", "We will work with a simple synthetic dataset generated from a normal distribution, and then both MLE for density estimation.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Step 1: Import Libraries**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy.stats import norm\n", "\n", "# For visualization\n", "import seaborn as sns\n", "sns.set(style=\"whitegrid\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### **Step 2: Generate Synthetic Data**\n", "\n", "Let's generate some synthetic data from a normal distribution, which will be our target density to estimate.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# Generating synthetic data\n", "#np.random.seed(42)\n", "data = np.random.normal(loc=0, scale=1, size=10)\n", "\n", "# Visualizing the data with a histogram\n", "plt.figure(figsize=(8, 4))\n", "sns.histplot(data, bins=20, kde=True, color=\"blue\", stat=\"density\")\n", "plt.title(\"Histogram of the Synthetic Data\")\n", "plt.xlabel(\"Data Values\")\n", "plt.ylabel(\"Density\")\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### **Step 3: Maximum Likelihood Estimation (MLE)**\n", "\n", "In MLE, we assume the data is generated from a normal distribution, and we want to estimate the parameters (mean $\\mu$ and variance $\\sigma^2$) by maximizing the likelihood function. \n", "\n", "The likelihood function is defined as:\n", "\n", "$\n", "L(\\mu, \\sigma^2) = \\prod_{i=1}^{N} \\frac{1}{\\sqrt{2\\pi\\sigma^2}} \\exp\\left(-\\frac{(x_i - \\mu)^2}{2\\sigma^2}\\right)\n", "$\n", "\n", "Maximizing the log-likelihood:\n", "\n", "$\n", "\\log L(\\mu, \\sigma^2) = -\\frac{N}{2} \\log(2\\pi) - \\frac{N}{2} \\log(\\sigma^2) - \\frac{1}{2\\sigma^2} \\sum_{i=1}^{N} (x_i - \\mu)^2\n", "$\n", "\n", "The maximum likelihood estimates (MLE) for the mean and variance are:\n", "\n", "$\n", "\\hat{\\mu}_{MLE} = \\frac{1}{N} \\sum_{i=1}^{N} x_i\n", "$\n", "\n", "$\n", "\\hat{\\sigma}^2_{MLE} = \\frac{1}{N} \\sum_{i=1}^{N} (x_i - \\hat{\\mu}_{MLE})^2\n", "$\n", "\n", "Maximizing the log-likelihood is equivalent to minimizing the **mean squared error (MSE)**:\n", "\n", "$\n", "\\text{MSE} = \\frac{1}{N} \\sum_{i=1}^{N} (x_i - \\mu)^2\n", "$\n", "\n", "#### **MLE Solution:**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# MLE estimation for the mean and variance\n", "mu_mle = np.mean(data)\n", "sigma_mle = np.var(data)\n", "\n", "# Displaying the results\n", "print(f\"MLE Estimate for Mean: {mu_mle}\")\n", "print(f\"MLE Estimate for Variance: {sigma_mle}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### **Step 5: Visualize the Results**\n", "\n", "Let's compare the density estimates from both MLE and MAP with the true distribution." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generating the density estimates\n", "x_values = np.linspace(-3, 3, 100)\n", "mle_density = norm.pdf(x_values, loc=mu_mle, scale=np.sqrt(sigma_mle))\n", "true_density = norm.pdf(x_values, loc=0, scale=1) # True distribution\n", "\n", "# Plotting the densities\n", "plt.figure(figsize=(10, 6))\n", "plt.plot(x_values, true_density, label=\"True Density\", color=\"black\", linestyle=\"--\")\n", "plt.plot(x_values, mle_density, label=\"MLE Estimate\", color=\"blue\")\n", "plt.title(\"Density Estimation: MLE vs MAP\")\n", "plt.xlabel(\"x\")\n", "plt.ylabel(\"Density\")\n", "plt.legend()\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "---\n", "\n", "### **Step 6: Conclusion**\n", "\n", "- **MLE**: Maximizes the likelihood, equivalent to minimizing MSE. The resulting estimate fits the data well but may overfit when data is sparse." ] } ], "metadata": { "kernelspec": { "display_name": "cvml_env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }