| Lecture | Topic | Readings | Tutorials & Evaluations |
|---|---|---|---|
| 1 | Introduction | DRL: Ch 1 | |
| 2 | Markov Decision Process (MDP) | DRL: Sec 2.1 - 2.2.2 | Quiz 1 |
| 3 | MDP Objective Function | DRL: Sec 2.2.3 | Quiz 2 |
| 4 | Model-Based Learning | DRL: Sec 2.2.4.1 |
Quiz 3
GridWorld via Value Iteration .ipynb, .html
|
| 5 | Model-Free Learning | DRL: Sec 2.2.4.2 - 5 |
Quiz 4
Assignment 1 |
| 6 | Deep Value-Based Agents | DRL: Sec 3.1 - 3.2.2 |
Quiz 5, Quiz 6, Quiz 7
Assignment 2 |
| 7 | Stable Deep Value-Based Learning | DRL: Sec 3.2.3 - 3.2.4 | |
| 8 | Policy-Based Learning | DRL: Sec 4.1 - 4.2.2 |
Cartpole Balancing via REFINE .ipynb, .html Gaussian Density .ipynb, .html Mountain Car via REFINE .ipynb, .html
|
| 9 | Improved Policy-Based Learning I: Actor-Critic Methods | DRL: Sec 4.2.3 - 4.2.4 | |
| 10 | Improved Policy-Based Learning II: Trust Region Methods | DRL: Sec 4.2.5 | |
| 11 | Improved Policy-Based Learning III: Soft Actor-Critic (SAC) | DRL: Sec 4.2.6 | |
| 12 | Improved Policy-Based Learning IV: Deterministic Policy Gradient | DRL: Sec 4.2.7 |