1. Empirically compare two different multi-armed bandit methods (e.g., Figures 2.5 or 2.6). Consider whether to use optimistic or pessimistic optimization. For a challenge, consider implementing UCB1.

2. Solve tic-tac-toe using dynamic programming. How will you test the agent's performance?

3. Solve the gridworld problem from the Isbel/Littman videos using dynamic programming.

4. Suggested programming assignment from book

5. ???