1. Do Example 4.2 in Barto & Sutton. Basically, it defines a decision-making problem based managing rental car inventories. Solve for the optimal policy by using Value Iteration, and Policy Iteration, and then compare your results. Policy Iteration requires policy evalution, for which you may either use a matrix method or dynamic program.
alternative problems, if you are so inspired...