评论(3)
请
登录后发表观点
-
啊这...补个图!
谢谢同学!我马虎了
-
由 Sutton 书(3.14)式得当前状态值函数和后继状态值得关系为
v_\pi(s)=\sum_{a} \pi(a \mid s) \sum_{s^{\prime}, r} p\left(s^{\prime}, r \mid s, a\right)\left[r+\gamma v_{\pi}\left(s^{\prime}\right)\right], \quad \text { for all } s \in \mathcal{S}.这里\pi(a|s)=0.5, p(s',r|s,a)=1, r=0, \gamma=1, 于是有
\left\{ \begin{aligned} V(A) &= \frac{1}{2}\cdot0 + \frac{1}{2}V(B) \\ V(B) &= \frac{1}{2}V(A) + \frac{1}{2}V(C) \\ V(C) &= \frac{1}{2}V(B) + \frac{1}{2}V(D) \\ V(D) &= \frac{1}{2}V(C) + \frac{1}{2}V(E) \\ V(E) &= \frac{1}{2}V(D) + \frac{1}{2}\cdot1 \\ \end{aligned}. \right.解得V(A)=\frac{1}{6},V(B)=\frac{2}{6},V(C)=\frac{3}{6},V(D)=\frac{4}{6},V(E)=\frac{5}{6}.
如果有 discount 则修改一下\gamma带入计算就好了。
啊这...补个图!
-
由 Sutton 书(3.14)式得当前状态值函数和后继状态值得关系为
v_\pi(s)=\sum_{a} \pi(a \mid s) \sum_{s^{\prime}, r} p\left(s^{\prime}, r \mid s, a\right)\left[r+\gamma v_{\pi}\left(s^{\prime}\right)\right], \quad \text { for all } s \in \mathcal{S}.这里\pi(a|s)=0.5, p(s',r|s,a)=1, r=0, \gamma=1, 于是有
\left\{ \begin{aligned} V(A) &= \frac{1}{2}\cdot0 + \frac{1}{2}V(B) \\ V(B) &= \frac{1}{2}V(A) + \frac{1}{2}V(C) \\ V(C) &= \frac{1}{2}V(B) + \frac{1}{2}V(D) \\ V(D) &= \frac{1}{2}V(C) + \frac{1}{2}V(E) \\ V(E) &= \frac{1}{2}V(D) + \frac{1}{2}\cdot1 \\ \end{aligned}. \right.解得V(A)=\frac{1}{6},V(B)=\frac{2}{6},V(C)=\frac{3}{6},V(D)=\frac{4}{6},V(E)=\frac{5}{6}.
如果有 discount 则修改一下\gamma带入计算就好了。