ADPRL - 近似动态规划和强化学习 - Note 8 - 近似策略迭代 (Approximate Policy Iteration) - 11GX

首页 > ADPRL - 近似动态规划和强化学习 - Note 8 - 近似策略迭代 (Approximate Policy Iteration)

ADPRL - 近似动态规划和强化学习 - Note 8 - 近似策略迭代 (Approximate Policy Iteration)

近似策略迭代

Note 8 近似策略迭代 Approximate Policy Iteration
- 8.1 通用框架（A Generic Framework）
- - - Lemma 8.1 单调性下的误差约束(Error bound under monotonicity)
    - Lemma 8.2 单一近似PI扫描的误差边界 (Error bound of single approximate PI sweep)
    - Proposition 8.1 近似PI算法的误差边界 (Error bound of the approximate PI algorithm)
    - Proposition 8.2 策略空间收敛下近似PI的误差界线 (Error bounds of approximate PI under convergence in policy space)
- 8.2 近似策略评估 (Approximate Policy Evaluation)
- - - 定义8.1 近似总成本函数
    - Lemma 8.3 近似成本函数的边界
    - Proposition 8.3 估计值与真实总成本函数之间的约束
- 8.3 近似的策略评估与遍历性 Approximate Policy Evaluation with Ergodicity
- - 8.3.1 各态历经的MDP（Ergodic MDP）
  - 8.3.2 平均平方预测贝尔曼误差 (Mean Squared Projected Bellman Error)
- 8.4 API 补充
- - 8.4.1 Approximate PI (API)
  - 8.4.2 APE via Bellman Residual Minimisation
  - 8.4.3 $ℓ2ell_{2}$ Based Bellman Residual Minimisation
  - 8.4.4 Recap: Closed form policy evaluation
  - 8.4.5 $ℓ2ell_{2}$ Based Bellman Residual Minimisation
  - 8.4.6 Approximate PI (API) with LFA + MSBE
  - 8.4.7 $Approximate PI (API) with LFA + ξ -weighted MSBE$
  - 8.4.8 Mean Squared Projected Bellman Error (MSPBE)
  - 8.4.9 Approximate PI (API) with LFA + $ξ$ -weighted MSPBE
  - 8.4.10 Approximate PI Summary

更多相关：

ADPRL - 近似动态规划和强化学习 - Note 7 - Approximate Dynamic Programming

Note 7 - 近似动态规划 Approximate Dynamic Programming7. 近似动态规划（Approximate Dynamic Programming）7.1 近似架构 (Approximation architectures)7.1.1 线性函数近似（Linear Function Approximat...