评论(3)
 
      请
      登录后发表观点
    
-    啊这...补个图! ![]() 谢谢同学!我马虎了 
-    由 Sutton 书(3.14)式得当前状态值函数和后继状态值得关系为 v_\pi(s)=\sum_{a} \pi(a \mid s) \sum_{s^{\prime}, r} p\left(s^{\prime}, r \mid s, a\right)\left[r+\gamma v_{\pi}\left(s^{\prime}\right)\right], \quad \text { for all } s \in \mathcal{S}.这里\pi(a|s)=0.5, p(s',r|s,a)=1, r=0, \gamma=1, 于是有 \left\{ \begin{aligned} V(A) &= \frac{1}{2}\cdot0 + \frac{1}{2}V(B) \\ V(B) &= \frac{1}{2}V(A) + \frac{1}{2}V(C) \\ V(C) &= \frac{1}{2}V(B) + \frac{1}{2}V(D) \\ V(D) &= \frac{1}{2}V(C) + \frac{1}{2}V(E) \\ V(E) &= \frac{1}{2}V(D) + \frac{1}{2}\cdot1 \\ \end{aligned}. \right.解得V(A)=\frac{1}{6},V(B)=\frac{2}{6},V(C)=\frac{3}{6},V(D)=\frac{4}{6},V(E)=\frac{5}{6}. 如果有 discount 则修改一下\gamma带入计算就好了。 啊这...补个图! ![]() 
-    由 Sutton 书(3.14)式得当前状态值函数和后继状态值得关系为 v_\pi(s)=\sum_{a} \pi(a \mid s) \sum_{s^{\prime}, r} p\left(s^{\prime}, r \mid s, a\right)\left[r+\gamma v_{\pi}\left(s^{\prime}\right)\right], \quad \text { for all } s \in \mathcal{S}.这里\pi(a|s)=0.5, p(s',r|s,a)=1, r=0, \gamma=1, 于是有 \left\{ \begin{aligned} V(A) &= \frac{1}{2}\cdot0 + \frac{1}{2}V(B) \\ V(B) &= \frac{1}{2}V(A) + \frac{1}{2}V(C) \\ V(C) &= \frac{1}{2}V(B) + \frac{1}{2}V(D) \\ V(D) &= \frac{1}{2}V(C) + \frac{1}{2}V(E) \\ V(E) &= \frac{1}{2}V(D) + \frac{1}{2}\cdot1 \\ \end{aligned}. \right.解得V(A)=\frac{1}{6},V(B)=\frac{2}{6},V(C)=\frac{3}{6},V(D)=\frac{4}{6},V(E)=\frac{5}{6}. 如果有 discount 则修改一下\gamma带入计算就好了。 


