Reinforcement Learning: Balancing a CartPole using Q-Learning
\n",
"
\n",
"Nazar Khan\n",
" CVML Lab\n",
" University of The Punjab\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial helps you get started with OpenAI Gymnasium (the updated version of OpenAI Gym) for reinforcement learning. This tutorial will provide a visual, hands-on experience, where you can see how an agent learns in a simple environment. We'll use **CartPole** as the example environment, which is one of the classic environments in RL.\n",
"\n",
"### **Getting Started with OpenAI Gymnasium: A Visual Tutorial**\n",
"\n",
"In this tutorial, you will learn how to:\n",
"1. Install OpenAI Gymnasium and dependencies.\n",
"2. Understand the CartPole environment.\n",
"3. Create and train a reinforcement learning agent using Q-learning.\n",
"4. Visualize how the agent learns over time.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### **Step 1: Install OpenAI Gymnasium and Dependencies**\n",
"\n",
"First, you need to install **OpenAI Gymnasium** (Gym’s newer version) and some other dependencies.\n",
"\n",
"#### Install the necessary libraries:\n",
"```bash\n",
"pip install gymnasium[all] numpy matplotlib\n",
"```\n",
"\n",
"- `gymnasium[all]`: This installs all the environments (including the classic CartPole environment) and necessary dependencies.\n",
"- `numpy`: For array and matrix manipulations.\n",
"- `matplotlib`: For visualizing the training process."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### **Step 2: Import Libraries and Set Up the CartPole Environment**\n",
"\n",
"Let’s start by importing the necessary libraries and initializing the **CartPole** environment."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
">>>>\n",
"Observation Space: Box([-4.8 -inf -0.41887903 -inf], [4.8 inf 0.41887903 inf], (4,), float32)\n",
"Action Space: Discrete(2)\n"
]
}
],
"source": [
"import gymnasium as gym\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.animation import FuncAnimation\n",
"import time\n",
"\n",
"# Create the CartPole environment\n",
"env = gym.make(\"CartPole-v1\", render_mode='rgb_array')\n",
"print(env)\n",
"\n",
"# Reset the environment to start\n",
"observation, info = env.reset()\n",
"print(\"Observation Space:\", env.observation_space)\n",
"print(\"Action Space:\", env.action_space)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- `gym.make(\"CartPole-v1\")`: This initializes the CartPole environment.\n",
"- `render_mode='human'`: This ensures that the environment renders a visual representation for human viewers.\n",
"- `env.reset()`: Resets the environment to its initial state.\n",
"\n",
"The output should display information about the observation and action spaces. For CartPole:\n",
"- **Observation space** is a continuous space with 4 elements (Cart position, Cart velocity, Pole angle, Pole velocity).\n",
"- **Action space** is discrete: 0 (move left) or 1 (move right)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### **Step 3: Define Q-Learning Algorithm**\n",
"\n",
"We'll now define a simple **Q-learning** algorithm for training the agent to balance the pole.\n",
"\n",
"#### Key elements for Q-learning:\n",
"1. **Q-table**: A table that stores Q-values for each state-action pair.\n",
"2. **Learning Rate (α)**: Determines how quickly the agent updates its Q-values.\n",
"3. **Discount Factor (γ)**: Determines the importance of future rewards.\n",
"4. **Exploration-Exploitation (ε)**: Determines the agent's strategy of exploration (random actions) versus exploitation (choosing the best-known action)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of Q-table: (24, 24, 24, 24, 2)\n"
]
}
],
"source": [
"# Parameters for Q-Learning\n",
"alpha = 0.1 # Learning rate\n",
"gamma = 0.99 # Discount factor\n",
"epsilon = 0.1 # Exploration rate\n",
"n_episodes = 30000 # Number of episodes for training\n",
"\n",
"# Initialize Q-table (for discrete states)\n",
"n_actions = env.action_space.n\n",
"q_table = np.zeros((24, 24, 24, 24, n_actions)) # For CartPole, discretized states (4D)\n",
"print(\"Shape of Q-table: \", q_table.shape)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"- **Discretizing the continuous state space**: CartPole's state space is continuous, but we’ll discretize it to make Q-learning feasible. Here, the 4 dimensions of the state space are divided into 24 bins each."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### **Step 4: Discretize the Continuous State Space**\n",
"\n",
"To apply Q-learning, we need to convert the continuous state space into discrete states. We’ll use `numpy`'s `linspace` to create bins."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# Define state space boundaries and number of bins for each dimension\n",
"state_bins = [\n",
" np.linspace(-2.4, 2.4, 24), # Cart position\n",
" np.linspace(-3.0, 3.0, 24), # Cart velocity\n",
" np.linspace(-0.5, 0.5, 24), # Pole angle\n",
" np.linspace(-2.0, 2.0, 24) # Pole velocity\n",
"]\n",
"\n",
"def discretize_state(state):\n",
" \"\"\"\n",
" Discretize the continuous state to an index in the Q-table.\n",
" \"\"\"\n",
" state_discretized = []\n",
" for i, (s, bins) in enumerate(zip(state, state_bins)):\n",
" state_discretized.append(np.digitize(s, bins) - 1)\n",
" return tuple(state_discretized)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- `np.digitize(s, bins)` maps each continuous state value to a bin index.\n",
"- This discretizes the 4-dimensional state space into 4 indices, each ranging from 0 to 23 (as we have 24 bins for each dimension)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"---\n",
"\n",
"### **Step 5: Train the Agent with Q-learning**\n",
"\n",
"Now we will implement the Q-learning training loop. In each episode, the agent will:\n",
"1. Choose an action based on an ε-greedy policy.\n",
"2. Take the action and observe the new state and reward.\n",
"3. Update the Q-table using the Q-learning update rule."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Episode 0/30000, Total Reward: 10.0\n",
"Episode 50/30000, Total Reward: 13.0\n",
"Episode 100/30000, Total Reward: 9.0\n",
"Episode 150/30000, Total Reward: 11.0\n",
"Episode 200/30000, Total Reward: 11.0\n",
"Episode 250/30000, Total Reward: 10.0\n",
"Episode 300/30000, Total Reward: 16.0\n",
"Episode 350/30000, Total Reward: 14.0\n",
"Episode 400/30000, Total Reward: 9.0\n",
"Episode 450/30000, Total Reward: 10.0\n",
"Episode 500/30000, Total Reward: 12.0\n",
"Episode 550/30000, Total Reward: 9.0\n",
"Episode 600/30000, Total Reward: 10.0\n",
"Episode 650/30000, Total Reward: 15.0\n",
"Episode 700/30000, Total Reward: 10.0\n",
"Episode 750/30000, Total Reward: 15.0\n",
"Episode 800/30000, Total Reward: 13.0\n",
"Episode 850/30000, Total Reward: 8.0\n",
"Episode 900/30000, Total Reward: 15.0\n",
"Episode 950/30000, Total Reward: 15.0\n",
"Episode 1000/30000, Total Reward: 15.0\n",
"Episode 1050/30000, Total Reward: 11.0\n",
"Episode 1100/30000, Total Reward: 16.0\n",
"Episode 1150/30000, Total Reward: 14.0\n",
"Episode 1200/30000, Total Reward: 18.0\n",
"Episode 1250/30000, Total Reward: 15.0\n",
"Episode 1300/30000, Total Reward: 19.0\n",
"Episode 1350/30000, Total Reward: 20.0\n",
"Episode 1400/30000, Total Reward: 11.0\n",
"Episode 1450/30000, Total Reward: 18.0\n",
"Episode 1500/30000, Total Reward: 37.0\n",
"Episode 1550/30000, Total Reward: 17.0\n",
"Episode 1600/30000, Total Reward: 14.0\n",
"Episode 1650/30000, Total Reward: 11.0\n",
"Episode 1700/30000, Total Reward: 16.0\n",
"Episode 1750/30000, Total Reward: 12.0\n",
"Episode 1800/30000, Total Reward: 16.0\n",
"Episode 1850/30000, Total Reward: 21.0\n",
"Episode 1900/30000, Total Reward: 13.0\n",
"Episode 1950/30000, Total Reward: 29.0\n",
"Episode 2000/30000, Total Reward: 14.0\n",
"Episode 2050/30000, Total Reward: 19.0\n",
"Episode 2100/30000, Total Reward: 27.0\n",
"Episode 2150/30000, Total Reward: 57.0\n",
"Episode 2200/30000, Total Reward: 74.0\n",
"Episode 2250/30000, Total Reward: 32.0\n",
"Episode 2300/30000, Total Reward: 57.0\n",
"Episode 2350/30000, Total Reward: 88.0\n",
"Episode 2400/30000, Total Reward: 59.0\n",
"Episode 2450/30000, Total Reward: 42.0\n",
"Episode 2500/30000, Total Reward: 54.0\n",
"Episode 2550/30000, Total Reward: 36.0\n",
"Episode 2600/30000, Total Reward: 44.0\n",
"Episode 2650/30000, Total Reward: 76.0\n",
"Episode 2700/30000, Total Reward: 35.0\n",
"Episode 2750/30000, Total Reward: 55.0\n",
"Episode 2800/30000, Total Reward: 34.0\n",
"Episode 2850/30000, Total Reward: 60.0\n",
"Episode 2900/30000, Total Reward: 76.0\n",
"Episode 2950/30000, Total Reward: 35.0\n",
"Episode 3000/30000, Total Reward: 51.0\n",
"Episode 3050/30000, Total Reward: 39.0\n",
"Episode 3100/30000, Total Reward: 36.0\n",
"Episode 3150/30000, Total Reward: 79.0\n",
"Episode 3200/30000, Total Reward: 63.0\n",
"Episode 3250/30000, Total Reward: 50.0\n",
"Episode 3300/30000, Total Reward: 76.0\n",
"Episode 3350/30000, Total Reward: 87.0\n",
"Episode 3400/30000, Total Reward: 33.0\n",
"Episode 3450/30000, Total Reward: 85.0\n",
"Episode 3500/30000, Total Reward: 42.0\n",
"Episode 3550/30000, Total Reward: 38.0\n",
"Episode 3600/30000, Total Reward: 55.0\n",
"Episode 3650/30000, Total Reward: 61.0\n",
"Episode 3700/30000, Total Reward: 41.0\n",
"Episode 3750/30000, Total Reward: 64.0\n",
"Episode 3800/30000, Total Reward: 68.0\n",
"Episode 3850/30000, Total Reward: 76.0\n",
"Episode 3900/30000, Total Reward: 89.0\n",
"Episode 3950/30000, Total Reward: 67.0\n",
"Episode 4000/30000, Total Reward: 156.0\n",
"Episode 4050/30000, Total Reward: 65.0\n",
"Episode 4100/30000, Total Reward: 163.0\n",
"Episode 4150/30000, Total Reward: 117.0\n",
"Episode 4200/30000, Total Reward: 51.0\n",
"Episode 4250/30000, Total Reward: 75.0\n",
"Episode 4300/30000, Total Reward: 55.0\n",
"Episode 4350/30000, Total Reward: 51.0\n",
"Episode 4400/30000, Total Reward: 193.0\n",
"Episode 4450/30000, Total Reward: 48.0\n",
"Episode 4500/30000, Total Reward: 180.0\n",
"Episode 4550/30000, Total Reward: 56.0\n",
"Episode 4600/30000, Total Reward: 203.0\n",
"Episode 4650/30000, Total Reward: 57.0\n",
"Episode 4700/30000, Total Reward: 89.0\n",
"Episode 4750/30000, Total Reward: 105.0\n",
"Episode 4800/30000, Total Reward: 110.0\n",
"Episode 4850/30000, Total Reward: 72.0\n",
"Episode 4900/30000, Total Reward: 73.0\n",
"Episode 4950/30000, Total Reward: 127.0\n",
"Episode 5000/30000, Total Reward: 44.0\n",
"Episode 5050/30000, Total Reward: 67.0\n",
"Episode 5100/30000, Total Reward: 76.0\n",
"Episode 5150/30000, Total Reward: 107.0\n",
"Episode 5200/30000, Total Reward: 154.0\n",
"Episode 5250/30000, Total Reward: 71.0\n",
"Episode 5300/30000, Total Reward: 52.0\n",
"Episode 5350/30000, Total Reward: 80.0\n",
"Episode 5400/30000, Total Reward: 78.0\n",
"Episode 5450/30000, Total Reward: 79.0\n",
"Episode 5500/30000, Total Reward: 99.0\n",
"Episode 5550/30000, Total Reward: 114.0\n",
"Episode 5600/30000, Total Reward: 70.0\n",
"Episode 5650/30000, Total Reward: 98.0\n",
"Episode 5700/30000, Total Reward: 307.0\n",
"Episode 5750/30000, Total Reward: 82.0\n",
"Episode 5800/30000, Total Reward: 130.0\n",
"Episode 5850/30000, Total Reward: 21.0\n",
"Episode 5900/30000, Total Reward: 72.0\n",
"Episode 5950/30000, Total Reward: 57.0\n",
"Episode 6000/30000, Total Reward: 44.0\n",
"Episode 6050/30000, Total Reward: 95.0\n",
"Episode 6100/30000, Total Reward: 58.0\n",
"Episode 6150/30000, Total Reward: 103.0\n",
"Episode 6200/30000, Total Reward: 59.0\n",
"Episode 6250/30000, Total Reward: 72.0\n",
"Episode 6300/30000, Total Reward: 133.0\n",
"Episode 6350/30000, Total Reward: 21.0\n",
"Episode 6400/30000, Total Reward: 44.0\n",
"Episode 6450/30000, Total Reward: 73.0\n",
"Episode 6500/30000, Total Reward: 92.0\n",
"Episode 6550/30000, Total Reward: 82.0\n",
"Episode 6600/30000, Total Reward: 57.0\n",
"Episode 6650/30000, Total Reward: 30.0\n",
"Episode 6700/30000, Total Reward: 93.0\n",
"Episode 6750/30000, Total Reward: 57.0\n",
"Episode 6800/30000, Total Reward: 170.0\n",
"Episode 6850/30000, Total Reward: 62.0\n",
"Episode 6900/30000, Total Reward: 52.0\n",
"Episode 6950/30000, Total Reward: 143.0\n",
"Episode 7000/30000, Total Reward: 69.0\n",
"Episode 7050/30000, Total Reward: 113.0\n",
"Episode 7100/30000, Total Reward: 115.0\n",
"Episode 7150/30000, Total Reward: 94.0\n",
"Episode 7200/30000, Total Reward: 103.0\n",
"Episode 7250/30000, Total Reward: 116.0\n",
"Episode 7300/30000, Total Reward: 91.0\n",
"Episode 7350/30000, Total Reward: 87.0\n",
"Episode 7400/30000, Total Reward: 48.0\n",
"Episode 7450/30000, Total Reward: 110.0\n",
"Episode 7500/30000, Total Reward: 106.0\n",
"Episode 7550/30000, Total Reward: 117.0\n",
"Episode 7600/30000, Total Reward: 158.0\n",
"Episode 7650/30000, Total Reward: 45.0\n",
"Episode 7700/30000, Total Reward: 61.0\n",
"Episode 7750/30000, Total Reward: 172.0\n",
"Episode 7800/30000, Total Reward: 207.0\n",
"Episode 7850/30000, Total Reward: 97.0\n",
"Episode 7900/30000, Total Reward: 55.0\n",
"Episode 7950/30000, Total Reward: 69.0\n",
"Episode 8000/30000, Total Reward: 87.0\n",
"Episode 8050/30000, Total Reward: 197.0\n",
"Episode 8100/30000, Total Reward: 70.0\n",
"Episode 8150/30000, Total Reward: 101.0\n",
"Episode 8200/30000, Total Reward: 31.0\n",
"Episode 8250/30000, Total Reward: 141.0\n",
"Episode 8300/30000, Total Reward: 161.0\n",
"Episode 8350/30000, Total Reward: 45.0\n",
"Episode 8400/30000, Total Reward: 53.0\n",
"Episode 8450/30000, Total Reward: 111.0\n",
"Episode 8500/30000, Total Reward: 170.0\n",
"Episode 8550/30000, Total Reward: 175.0\n",
"Episode 8600/30000, Total Reward: 89.0\n",
"Episode 8650/30000, Total Reward: 118.0\n",
"Episode 8700/30000, Total Reward: 92.0\n",
"Episode 8750/30000, Total Reward: 119.0\n",
"Episode 8800/30000, Total Reward: 98.0\n",
"Episode 8850/30000, Total Reward: 163.0\n",
"Episode 8900/30000, Total Reward: 56.0\n",
"Episode 8950/30000, Total Reward: 88.0\n",
"Episode 9000/30000, Total Reward: 96.0\n",
"Episode 9050/30000, Total Reward: 83.0\n",
"Episode 9100/30000, Total Reward: 73.0\n",
"Episode 9150/30000, Total Reward: 151.0\n",
"Episode 9200/30000, Total Reward: 155.0\n",
"Episode 9250/30000, Total Reward: 98.0\n",
"Episode 9300/30000, Total Reward: 157.0\n",
"Episode 9350/30000, Total Reward: 113.0\n",
"Episode 9400/30000, Total Reward: 118.0\n",
"Episode 9450/30000, Total Reward: 93.0\n",
"Episode 9500/30000, Total Reward: 113.0\n",
"Episode 9550/30000, Total Reward: 45.0\n",
"Episode 9600/30000, Total Reward: 106.0\n",
"Episode 9650/30000, Total Reward: 110.0\n",
"Episode 9700/30000, Total Reward: 116.0\n",
"Episode 9750/30000, Total Reward: 168.0\n",
"Episode 9800/30000, Total Reward: 139.0\n",
"Episode 9850/30000, Total Reward: 115.0\n",
"Episode 9900/30000, Total Reward: 59.0\n",
"Episode 9950/30000, Total Reward: 67.0\n",
"Episode 10000/30000, Total Reward: 197.0\n",
"Episode 10050/30000, Total Reward: 102.0\n",
"Episode 10100/30000, Total Reward: 151.0\n",
"Episode 10150/30000, Total Reward: 115.0\n",
"Episode 10200/30000, Total Reward: 135.0\n",
"Episode 10250/30000, Total Reward: 136.0\n",
"Episode 10300/30000, Total Reward: 156.0\n",
"Episode 10350/30000, Total Reward: 204.0\n",
"Episode 10400/30000, Total Reward: 92.0\n",
"Episode 10450/30000, Total Reward: 119.0\n",
"Episode 10500/30000, Total Reward: 101.0\n",
"Episode 10550/30000, Total Reward: 117.0\n",
"Episode 10600/30000, Total Reward: 107.0\n",
"Episode 10650/30000, Total Reward: 94.0\n",
"Episode 10700/30000, Total Reward: 88.0\n",
"Episode 10750/30000, Total Reward: 112.0\n",
"Episode 10800/30000, Total Reward: 114.0\n",
"Episode 10850/30000, Total Reward: 159.0\n",
"Episode 10900/30000, Total Reward: 102.0\n",
"Episode 10950/30000, Total Reward: 111.0\n",
"Episode 11000/30000, Total Reward: 53.0\n",
"Episode 11050/30000, Total Reward: 111.0\n",
"Episode 11100/30000, Total Reward: 26.0\n",
"Episode 11150/30000, Total Reward: 58.0\n",
"Episode 11200/30000, Total Reward: 22.0\n",
"Episode 11250/30000, Total Reward: 81.0\n",
"Episode 11300/30000, Total Reward: 49.0\n",
"Episode 11350/30000, Total Reward: 19.0\n",
"Episode 11400/30000, Total Reward: 52.0\n",
"Episode 11450/30000, Total Reward: 57.0\n",
"Episode 11500/30000, Total Reward: 66.0\n",
"Episode 11550/30000, Total Reward: 85.0\n",
"Episode 11600/30000, Total Reward: 76.0\n",
"Episode 11650/30000, Total Reward: 92.0\n",
"Episode 11700/30000, Total Reward: 36.0\n",
"Episode 11750/30000, Total Reward: 107.0\n",
"Episode 11800/30000, Total Reward: 43.0\n",
"Episode 11850/30000, Total Reward: 108.0\n",
"Episode 11900/30000, Total Reward: 106.0\n",
"Episode 11950/30000, Total Reward: 78.0\n",
"Episode 12000/30000, Total Reward: 46.0\n",
"Episode 12050/30000, Total Reward: 69.0\n",
"Episode 12100/30000, Total Reward: 77.0\n",
"Episode 12150/30000, Total Reward: 35.0\n",
"Episode 12200/30000, Total Reward: 137.0\n",
"Episode 12250/30000, Total Reward: 96.0\n",
"Episode 12300/30000, Total Reward: 38.0\n",
"Episode 12350/30000, Total Reward: 109.0\n",
"Episode 12400/30000, Total Reward: 55.0\n",
"Episode 12450/30000, Total Reward: 62.0\n",
"Episode 12500/30000, Total Reward: 133.0\n",
"Episode 12550/30000, Total Reward: 70.0\n",
"Episode 12600/30000, Total Reward: 101.0\n",
"Episode 12650/30000, Total Reward: 45.0\n",
"Episode 12700/30000, Total Reward: 80.0\n",
"Episode 12750/30000, Total Reward: 67.0\n",
"Episode 12800/30000, Total Reward: 49.0\n",
"Episode 12850/30000, Total Reward: 70.0\n",
"Episode 12900/30000, Total Reward: 66.0\n",
"Episode 12950/30000, Total Reward: 70.0\n",
"Episode 13000/30000, Total Reward: 69.0\n",
"Episode 13050/30000, Total Reward: 112.0\n",
"Episode 13100/30000, Total Reward: 83.0\n",
"Episode 13150/30000, Total Reward: 21.0\n",
"Episode 13200/30000, Total Reward: 101.0\n",
"Episode 13250/30000, Total Reward: 62.0\n",
"Episode 13300/30000, Total Reward: 114.0\n",
"Episode 13350/30000, Total Reward: 110.0\n",
"Episode 13400/30000, Total Reward: 87.0\n",
"Episode 13450/30000, Total Reward: 154.0\n",
"Episode 13500/30000, Total Reward: 141.0\n",
"Episode 13550/30000, Total Reward: 71.0\n",
"Episode 13600/30000, Total Reward: 128.0\n",
"Episode 13650/30000, Total Reward: 121.0\n",
"Episode 13700/30000, Total Reward: 30.0\n",
"Episode 13750/30000, Total Reward: 50.0\n",
"Episode 13800/30000, Total Reward: 116.0\n",
"Episode 13850/30000, Total Reward: 78.0\n",
"Episode 13900/30000, Total Reward: 123.0\n",
"Episode 13950/30000, Total Reward: 27.0\n",
"Episode 14000/30000, Total Reward: 77.0\n",
"Episode 14050/30000, Total Reward: 56.0\n",
"Episode 14100/30000, Total Reward: 70.0\n",
"Episode 14150/30000, Total Reward: 75.0\n",
"Episode 14200/30000, Total Reward: 50.0\n",
"Episode 14250/30000, Total Reward: 53.0\n",
"Episode 14300/30000, Total Reward: 82.0\n",
"Episode 14350/30000, Total Reward: 115.0\n",
"Episode 14400/30000, Total Reward: 55.0\n",
"Episode 14450/30000, Total Reward: 79.0\n",
"Episode 14500/30000, Total Reward: 141.0\n",
"Episode 14550/30000, Total Reward: 78.0\n",
"Episode 14600/30000, Total Reward: 137.0\n",
"Episode 14650/30000, Total Reward: 62.0\n",
"Episode 14700/30000, Total Reward: 93.0\n",
"Episode 14750/30000, Total Reward: 111.0\n",
"Episode 14800/30000, Total Reward: 59.0\n",
"Episode 14850/30000, Total Reward: 89.0\n",
"Episode 14900/30000, Total Reward: 82.0\n",
"Episode 14950/30000, Total Reward: 70.0\n",
"Episode 15000/30000, Total Reward: 74.0\n",
"Episode 15050/30000, Total Reward: 81.0\n",
"Episode 15100/30000, Total Reward: 59.0\n",
"Episode 15150/30000, Total Reward: 59.0\n",
"Episode 15200/30000, Total Reward: 67.0\n",
"Episode 15250/30000, Total Reward: 87.0\n",
"Episode 15300/30000, Total Reward: 54.0\n",
"Episode 15350/30000, Total Reward: 108.0\n",
"Episode 15400/30000, Total Reward: 82.0\n",
"Episode 15450/30000, Total Reward: 76.0\n",
"Episode 15500/30000, Total Reward: 79.0\n",
"Episode 15550/30000, Total Reward: 95.0\n",
"Episode 15600/30000, Total Reward: 90.0\n",
"Episode 15650/30000, Total Reward: 101.0\n",
"Episode 15700/30000, Total Reward: 90.0\n",
"Episode 15750/30000, Total Reward: 128.0\n",
"Episode 15800/30000, Total Reward: 134.0\n",
"Episode 15850/30000, Total Reward: 97.0\n",
"Episode 15900/30000, Total Reward: 93.0\n",
"Episode 15950/30000, Total Reward: 117.0\n",
"Episode 16000/30000, Total Reward: 87.0\n",
"Episode 16050/30000, Total Reward: 129.0\n",
"Episode 16100/30000, Total Reward: 113.0\n",
"Episode 16150/30000, Total Reward: 86.0\n",
"Episode 16200/30000, Total Reward: 184.0\n",
"Episode 16250/30000, Total Reward: 96.0\n",
"Episode 16300/30000, Total Reward: 90.0\n",
"Episode 16350/30000, Total Reward: 103.0\n",
"Episode 16400/30000, Total Reward: 144.0\n",
"Episode 16450/30000, Total Reward: 87.0\n",
"Episode 16500/30000, Total Reward: 150.0\n",
"Episode 16550/30000, Total Reward: 162.0\n",
"Episode 16600/30000, Total Reward: 125.0\n",
"Episode 16650/30000, Total Reward: 94.0\n",
"Episode 16700/30000, Total Reward: 63.0\n",
"Episode 16750/30000, Total Reward: 110.0\n",
"Episode 16800/30000, Total Reward: 53.0\n",
"Episode 16850/30000, Total Reward: 80.0\n",
"Episode 16900/30000, Total Reward: 77.0\n",
"Episode 16950/30000, Total Reward: 128.0\n",
"Episode 17000/30000, Total Reward: 87.0\n",
"Episode 17050/30000, Total Reward: 126.0\n",
"Episode 17100/30000, Total Reward: 105.0\n",
"Episode 17150/30000, Total Reward: 255.0\n",
"Episode 17200/30000, Total Reward: 119.0\n",
"Episode 17250/30000, Total Reward: 38.0\n",
"Episode 17300/30000, Total Reward: 130.0\n",
"Episode 17350/30000, Total Reward: 125.0\n",
"Episode 17400/30000, Total Reward: 126.0\n",
"Episode 17450/30000, Total Reward: 131.0\n",
"Episode 17500/30000, Total Reward: 87.0\n",
"Episode 17550/30000, Total Reward: 187.0\n",
"Episode 17600/30000, Total Reward: 206.0\n",
"Episode 17650/30000, Total Reward: 176.0\n",
"Episode 17700/30000, Total Reward: 113.0\n",
"Episode 17750/30000, Total Reward: 150.0\n",
"Episode 17800/30000, Total Reward: 295.0\n",
"Episode 17850/30000, Total Reward: 120.0\n",
"Episode 17900/30000, Total Reward: 185.0\n",
"Episode 17950/30000, Total Reward: 163.0\n",
"Episode 18000/30000, Total Reward: 256.0\n",
"Episode 18050/30000, Total Reward: 148.0\n",
"Episode 18100/30000, Total Reward: 134.0\n",
"Episode 18150/30000, Total Reward: 190.0\n",
"Episode 18200/30000, Total Reward: 135.0\n",
"Episode 18250/30000, Total Reward: 141.0\n",
"Episode 18300/30000, Total Reward: 144.0\n",
"Episode 18350/30000, Total Reward: 117.0\n",
"Episode 18400/30000, Total Reward: 199.0\n",
"Episode 18450/30000, Total Reward: 133.0\n",
"Episode 18500/30000, Total Reward: 114.0\n",
"Episode 18550/30000, Total Reward: 178.0\n",
"Episode 18600/30000, Total Reward: 225.0\n",
"Episode 18650/30000, Total Reward: 213.0\n",
"Episode 18700/30000, Total Reward: 172.0\n",
"Episode 18750/30000, Total Reward: 142.0\n",
"Episode 18800/30000, Total Reward: 102.0\n",
"Episode 18850/30000, Total Reward: 113.0\n",
"Episode 18900/30000, Total Reward: 118.0\n",
"Episode 18950/30000, Total Reward: 147.0\n",
"Episode 19000/30000, Total Reward: 129.0\n",
"Episode 19050/30000, Total Reward: 180.0\n",
"Episode 19100/30000, Total Reward: 97.0\n",
"Episode 19150/30000, Total Reward: 135.0\n",
"Episode 19200/30000, Total Reward: 169.0\n",
"Episode 19250/30000, Total Reward: 124.0\n",
"Episode 19300/30000, Total Reward: 134.0\n",
"Episode 19350/30000, Total Reward: 102.0\n",
"Episode 19400/30000, Total Reward: 127.0\n",
"Episode 19450/30000, Total Reward: 250.0\n",
"Episode 19500/30000, Total Reward: 105.0\n",
"Episode 19550/30000, Total Reward: 130.0\n",
"Episode 19600/30000, Total Reward: 166.0\n",
"Episode 19650/30000, Total Reward: 86.0\n",
"Episode 19700/30000, Total Reward: 202.0\n",
"Episode 19750/30000, Total Reward: 171.0\n",
"Episode 19800/30000, Total Reward: 159.0\n",
"Episode 19850/30000, Total Reward: 247.0\n",
"Episode 19900/30000, Total Reward: 135.0\n",
"Episode 19950/30000, Total Reward: 108.0\n",
"Episode 20000/30000, Total Reward: 96.0\n",
"Episode 20050/30000, Total Reward: 24.0\n",
"Episode 20100/30000, Total Reward: 266.0\n",
"Episode 20150/30000, Total Reward: 138.0\n",
"Episode 20200/30000, Total Reward: 141.0\n",
"Episode 20250/30000, Total Reward: 119.0\n",
"Episode 20300/30000, Total Reward: 29.0\n",
"Episode 20350/30000, Total Reward: 189.0\n",
"Episode 20400/30000, Total Reward: 127.0\n",
"Episode 20450/30000, Total Reward: 140.0\n",
"Episode 20500/30000, Total Reward: 191.0\n",
"Episode 20550/30000, Total Reward: 135.0\n",
"Episode 20600/30000, Total Reward: 189.0\n",
"Episode 20650/30000, Total Reward: 129.0\n",
"Episode 20700/30000, Total Reward: 66.0\n",
"Episode 20750/30000, Total Reward: 218.0\n",
"Episode 20800/30000, Total Reward: 112.0\n",
"Episode 20850/30000, Total Reward: 142.0\n",
"Episode 20900/30000, Total Reward: 104.0\n",
"Episode 20950/30000, Total Reward: 134.0\n",
"Episode 21000/30000, Total Reward: 76.0\n",
"Episode 21050/30000, Total Reward: 130.0\n",
"Episode 21100/30000, Total Reward: 91.0\n",
"Episode 21150/30000, Total Reward: 134.0\n",
"Episode 21200/30000, Total Reward: 149.0\n",
"Episode 21250/30000, Total Reward: 24.0\n",
"Episode 21300/30000, Total Reward: 101.0\n",
"Episode 21350/30000, Total Reward: 198.0\n",
"Episode 21400/30000, Total Reward: 68.0\n",
"Episode 21450/30000, Total Reward: 86.0\n",
"Episode 21500/30000, Total Reward: 147.0\n",
"Episode 21550/30000, Total Reward: 83.0\n",
"Episode 21600/30000, Total Reward: 68.0\n",
"Episode 21650/30000, Total Reward: 84.0\n",
"Episode 21700/30000, Total Reward: 29.0\n",
"Episode 21750/30000, Total Reward: 96.0\n",
"Episode 21800/30000, Total Reward: 58.0\n",
"Episode 21850/30000, Total Reward: 133.0\n",
"Episode 21900/30000, Total Reward: 110.0\n",
"Episode 21950/30000, Total Reward: 117.0\n",
"Episode 22000/30000, Total Reward: 84.0\n",
"Episode 22050/30000, Total Reward: 135.0\n",
"Episode 22100/30000, Total Reward: 78.0\n",
"Episode 22150/30000, Total Reward: 133.0\n",
"Episode 22200/30000, Total Reward: 39.0\n",
"Episode 22250/30000, Total Reward: 88.0\n",
"Episode 22300/30000, Total Reward: 92.0\n",
"Episode 22350/30000, Total Reward: 94.0\n",
"Episode 22400/30000, Total Reward: 99.0\n",
"Episode 22450/30000, Total Reward: 89.0\n",
"Episode 22500/30000, Total Reward: 99.0\n",
"Episode 22550/30000, Total Reward: 132.0\n",
"Episode 22600/30000, Total Reward: 69.0\n",
"Episode 22650/30000, Total Reward: 96.0\n",
"Episode 22700/30000, Total Reward: 155.0\n",
"Episode 22750/30000, Total Reward: 133.0\n",
"Episode 22800/30000, Total Reward: 113.0\n",
"Episode 22850/30000, Total Reward: 124.0\n",
"Episode 22900/30000, Total Reward: 145.0\n",
"Episode 22950/30000, Total Reward: 103.0\n",
"Episode 23000/30000, Total Reward: 95.0\n",
"Episode 23050/30000, Total Reward: 84.0\n",
"Episode 23100/30000, Total Reward: 107.0\n",
"Episode 23150/30000, Total Reward: 170.0\n",
"Episode 23200/30000, Total Reward: 218.0\n",
"Episode 23250/30000, Total Reward: 162.0\n",
"Episode 23300/30000, Total Reward: 135.0\n",
"Episode 23350/30000, Total Reward: 52.0\n",
"Episode 23400/30000, Total Reward: 155.0\n",
"Episode 23450/30000, Total Reward: 188.0\n",
"Episode 23500/30000, Total Reward: 146.0\n",
"Episode 23550/30000, Total Reward: 93.0\n",
"Episode 23600/30000, Total Reward: 189.0\n",
"Episode 23650/30000, Total Reward: 97.0\n",
"Episode 23700/30000, Total Reward: 270.0\n",
"Episode 23750/30000, Total Reward: 162.0\n",
"Episode 23800/30000, Total Reward: 84.0\n",
"Episode 23850/30000, Total Reward: 51.0\n",
"Episode 23900/30000, Total Reward: 175.0\n",
"Episode 23950/30000, Total Reward: 133.0\n",
"Episode 24000/30000, Total Reward: 110.0\n",
"Episode 24050/30000, Total Reward: 129.0\n",
"Episode 24100/30000, Total Reward: 66.0\n",
"Episode 24150/30000, Total Reward: 104.0\n",
"Episode 24200/30000, Total Reward: 111.0\n",
"Episode 24250/30000, Total Reward: 72.0\n",
"Episode 24300/30000, Total Reward: 92.0\n",
"Episode 24350/30000, Total Reward: 112.0\n",
"Episode 24400/30000, Total Reward: 104.0\n",
"Episode 24450/30000, Total Reward: 130.0\n",
"Episode 24500/30000, Total Reward: 36.0\n",
"Episode 24550/30000, Total Reward: 164.0\n",
"Episode 24600/30000, Total Reward: 180.0\n",
"Episode 24650/30000, Total Reward: 137.0\n",
"Episode 24700/30000, Total Reward: 180.0\n",
"Episode 24750/30000, Total Reward: 108.0\n",
"Episode 24800/30000, Total Reward: 52.0\n",
"Episode 24850/30000, Total Reward: 86.0\n",
"Episode 24900/30000, Total Reward: 158.0\n",
"Episode 24950/30000, Total Reward: 114.0\n",
"Episode 25000/30000, Total Reward: 110.0\n",
"Episode 25050/30000, Total Reward: 87.0\n",
"Episode 25100/30000, Total Reward: 99.0\n",
"Episode 25150/30000, Total Reward: 116.0\n",
"Episode 25200/30000, Total Reward: 48.0\n",
"Episode 25250/30000, Total Reward: 96.0\n",
"Episode 25300/30000, Total Reward: 137.0\n",
"Episode 25350/30000, Total Reward: 133.0\n",
"Episode 25400/30000, Total Reward: 106.0\n",
"Episode 25450/30000, Total Reward: 87.0\n",
"Episode 25500/30000, Total Reward: 134.0\n",
"Episode 25550/30000, Total Reward: 107.0\n",
"Episode 25600/30000, Total Reward: 126.0\n",
"Episode 25650/30000, Total Reward: 102.0\n",
"Episode 25700/30000, Total Reward: 203.0\n",
"Episode 25750/30000, Total Reward: 250.0\n",
"Episode 25800/30000, Total Reward: 224.0\n",
"Episode 25850/30000, Total Reward: 107.0\n",
"Episode 25900/30000, Total Reward: 84.0\n",
"Episode 25950/30000, Total Reward: 124.0\n",
"Episode 26000/30000, Total Reward: 138.0\n",
"Episode 26050/30000, Total Reward: 188.0\n",
"Episode 26100/30000, Total Reward: 105.0\n",
"Episode 26150/30000, Total Reward: 145.0\n",
"Episode 26200/30000, Total Reward: 128.0\n",
"Episode 26250/30000, Total Reward: 135.0\n",
"Episode 26300/30000, Total Reward: 68.0\n",
"Episode 26350/30000, Total Reward: 177.0\n",
"Episode 26400/30000, Total Reward: 226.0\n",
"Episode 26450/30000, Total Reward: 126.0\n",
"Episode 26500/30000, Total Reward: 117.0\n",
"Episode 26550/30000, Total Reward: 107.0\n",
"Episode 26600/30000, Total Reward: 114.0\n",
"Episode 26650/30000, Total Reward: 98.0\n",
"Episode 26700/30000, Total Reward: 103.0\n",
"Episode 26750/30000, Total Reward: 143.0\n",
"Episode 26800/30000, Total Reward: 174.0\n",
"Episode 26850/30000, Total Reward: 149.0\n",
"Episode 26900/30000, Total Reward: 198.0\n",
"Episode 26950/30000, Total Reward: 155.0\n",
"Episode 27000/30000, Total Reward: 105.0\n",
"Episode 27050/30000, Total Reward: 157.0\n",
"Episode 27100/30000, Total Reward: 177.0\n",
"Episode 27150/30000, Total Reward: 189.0\n",
"Episode 27200/30000, Total Reward: 33.0\n",
"Episode 27250/30000, Total Reward: 127.0\n",
"Episode 27300/30000, Total Reward: 173.0\n",
"Episode 27350/30000, Total Reward: 251.0\n",
"Episode 27400/30000, Total Reward: 238.0\n",
"Episode 27450/30000, Total Reward: 250.0\n",
"Episode 27500/30000, Total Reward: 166.0\n",
"Episode 27550/30000, Total Reward: 221.0\n",
"Episode 27600/30000, Total Reward: 218.0\n",
"Episode 27650/30000, Total Reward: 133.0\n",
"Episode 27700/30000, Total Reward: 242.0\n",
"Episode 27750/30000, Total Reward: 166.0\n",
"Episode 27800/30000, Total Reward: 181.0\n",
"Episode 27850/30000, Total Reward: 160.0\n",
"Episode 27900/30000, Total Reward: 145.0\n",
"Episode 27950/30000, Total Reward: 171.0\n",
"Episode 28000/30000, Total Reward: 181.0\n",
"Episode 28050/30000, Total Reward: 34.0\n",
"Episode 28100/30000, Total Reward: 221.0\n",
"Episode 28150/30000, Total Reward: 175.0\n",
"Episode 28200/30000, Total Reward: 86.0\n",
"Episode 28250/30000, Total Reward: 97.0\n",
"Episode 28300/30000, Total Reward: 46.0\n",
"Episode 28350/30000, Total Reward: 109.0\n",
"Episode 28400/30000, Total Reward: 144.0\n",
"Episode 28450/30000, Total Reward: 182.0\n",
"Episode 28500/30000, Total Reward: 107.0\n",
"Episode 28550/30000, Total Reward: 158.0\n",
"Episode 28600/30000, Total Reward: 177.0\n",
"Episode 28650/30000, Total Reward: 165.0\n",
"Episode 28700/30000, Total Reward: 131.0\n",
"Episode 28750/30000, Total Reward: 245.0\n",
"Episode 28800/30000, Total Reward: 275.0\n",
"Episode 28850/30000, Total Reward: 138.0\n",
"Episode 28900/30000, Total Reward: 146.0\n",
"Episode 28950/30000, Total Reward: 170.0\n",
"Episode 29000/30000, Total Reward: 235.0\n",
"Episode 29050/30000, Total Reward: 500.0\n",
"Episode 29100/30000, Total Reward: 356.0\n",
"Episode 29150/30000, Total Reward: 179.0\n",
"Episode 29200/30000, Total Reward: 238.0\n",
"Episode 29250/30000, Total Reward: 133.0\n",
"Episode 29300/30000, Total Reward: 113.0\n",
"Episode 29350/30000, Total Reward: 163.0\n",
"Episode 29400/30000, Total Reward: 169.0\n",
"Episode 29450/30000, Total Reward: 355.0\n",
"Episode 29500/30000, Total Reward: 233.0\n",
"Episode 29550/30000, Total Reward: 132.0\n",
"Episode 29600/30000, Total Reward: 172.0\n",
"Episode 29650/30000, Total Reward: 154.0\n",
"Episode 29700/30000, Total Reward: 269.0\n",
"Episode 29750/30000, Total Reward: 211.0\n",
"Episode 29800/30000, Total Reward: 179.0\n",
"Episode 29850/30000, Total Reward: 121.0\n",
"Episode 29900/30000, Total Reward: 121.0\n",
"Episode 29950/30000, Total Reward: 157.0\n"
]
},
{
"data": {
"image/png": "",
"text/plain": [
"