1. Empirically compare two different multi-armed bandit methods (e.g., Figures 2.5 or 2.6). Consider whether to use optimistic or pessimistic optimization. For a challenge, consider implementing UCB1. 2. Solve tic-tac-toe using dynamic programming. How will you test the agent's performance? 3. Solve the gridworld problem from the Isbel/Littman videos using dynamic programming. 4. Suggested programming assignment from book 5. ???