Deep Reinforcement Learning - University of The Punjab

Study the gridworld_value_iteration.ipynb notebook that trains an agent to navigate a simple Grid World using Value Iteration. Make a GIF containing the following agents

(10 marks) Random
(30 marks) Monte Carlo
(30 marks) Q-Learning
(30 marks) SARSA

solving a 5x5 GridWorld with 5 holes. There should be 2 holes close to the starting position, 2 holes close to the goal, and 1 hole somewhere else. Ensure that the grid is solvable, yet challenging. Submit your code and your GIF in the Google Classroom.

Deep Reinforcement Learning (CS-866)

Department of Computer Science
University of The Punjab

Assignment 2