Course Schedule
The following tentative course schedule will be followed. The schedule is flexible and might change due to progression paste, students' feedback, and unforeseen issues.
Event | Week | Description | Required reading |
---|---|---|---|
Lecture | 1 | Introduction to Reinforcement Learning | SB (Sutton and Barton) Chp 1 |
Lecture | 1 | Multi-armed bandits | SB Chp 2 |
Lecture | 2 | Markov-decision processes | SB Chp 3 |
Assignment | 3 | Value Iteration | |
Lecture | 3 | Dynamic programming | SB Chp 4 |
Assignment | 4 | Policy Iteration | |
Lecture | 4 | Monte Carlo methods | SB Chp 5 |
Assignment | 4 | Monte Carlo methods | |
Lecture | 5 | Temporal-difference learning | SB Chp 6 |
Assignment | 5 | Q-learning | |
Lecture | 6 | n-step bootstrapping | SB Chp 7 |
Lecture | 6 | Tabular methods | SB Chp 8 |
Project | 6 | Project proposal due | |
Lecture | 7 | Function approximation | SB Chp 9 |
Lecture | 7 | Deep Neural network approximation | DNN tutorial |
Lecture | 8 | On-policy control with approximation | SB Chp 10 |
Lecture | 9 | Eligibility traces | SB Chp 12 |
Lecture | 9 | Deep Q-learning | DQN, DDQN |
Assignment | 9 | DQN | |
Lecture | 10 | Policy gradient methods | SB Chp 13 |
Assignment | 10 | REINFORCE | |
Lecture | 10 | Actor-critic methods | SB Chp 13.5 |
Lecture | 11 | Trust regions | TRPO, PPO |
Assignment | 11 | A2C | |
Lecture | 11 | Soft Actor-Critic | SAC |
Project | 11 | Literature review due | |
Lecture | 12 | Imitation learning | DAgger |
Lecture | 12 | Inverse Reinforcement Learning | Apprenticeship Learning, GAIL |
Lecture | 13 | Transfer and Multi-task Learning | Progressive Neural Networks |
Lecture | 13 | Derivative free optimization | CMA-ES |
Lecture | 14 | Curriculum learning | |
Lecture | 14 | Advanced topics TBD | |
Project | 14 | Final paper due |