Sometimes MDPs are formulated with a
reward function $R(s,a)$ that depends on the action taken or with a
reward function $R(s,a,s')$ that also depends on the outcome state.
1. Write the Bellman equations for these formulations.
2. Show how an MDP with reward function $R(s,a,s')$ can be transformed into a different MDP with reward function $R(s,a)$, such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP.
3. Now do the same to convert MDPs with $R(s,a)$ into MDPs with $R(s)$.
  1. Write the Bellman equations for these formulations.
2. Show how an MDP with reward function $R(s,a,s')$ can be transformed into a different MDP with reward function $R(s,a)$, such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP.
3. Now do the same to convert MDPs with $R(s,a)$ into MDPs with $R(s)$.
Sometimes MDPs are formulated with a
reward function $R(s,a)$ that depends on the action taken or with a
reward function $R(s,a,s')$ that also depends on the outcome state.
1.  Write the Bellman equations for these formulations.
2.  Show how an MDP with reward function $R(s,a,s')$ can be transformed
    into a different MDP with reward function $R(s,a)$, such that
    optimal policies in the new MDP correspond exactly to optimal
    policies in the original MDP.
3.  Now do the same to convert MDPs with $R(s,a)$ into MDPs with $R(s)$.