AIMA Artificial Intelligence a modern approch

AIMA-exercises is an open-source community of students, instructors and developers. Anyone can add an exercise, suggest answers to existing questions, or simply help us improve the platform. We accept contributions on this github repository.

Exercise 17.9

This exercise considers two-player MDPs that correspond to zero-sum, turn-taking games like those in Chapter game-playing-chapter. Let the players be $A$ and $B$, and let $R(s)$ be the reward for player $A$ in state $s$. (The reward for $B$ is always equal and opposite.)

  1. Let $U_A(s)$ be the utility of state $s$ when it is $A$’s turn to move in $s$, and let $U_B(s)$ be the utility of state $s$ when it is $B$’s turn to move in $s$. All rewards and utilities are calculated from $A$’s point of view (just as in a minimax game tree). Write down Bellman equations defining $U_A(s)$ and $U_B(s)$.

  2. Explain how to do two-player value iteration with these equations, and define a suitable termination criterion.

  3. Consider the game described in Figure line-game4-figure on page line-game4-figure. Draw the state space (rather than the game tree), showing the moves by $A$ as solid lines and moves by $B$ as dashed lines. Mark each state with $R(s)$. You will find it helpful to arrange the states $(s_A,s_B)$ on a two-dimensional grid, using $s_A$ and $s_B$ as “coordinates.”

  4. Now apply two-player value iteration to solve this game, and derive the optimal policy.

Figure [grid-mdp-figure] (a) $3 \times 3$ world for Exercise [3x3-mdp-exercise](#/). The reward for each state is indicated. The upper right square is a terminal state. (b) $101 \times 3$ world for Exercise [101x3-mdp-exercise](#/) (omitting 93 identical columns in the middle). The start state has reward 0.

grid-mdp-figure

View Answer