Policy Gradient Based Semi-Markov Decision Problems : Approximation and Estimation Errors
スポンサーリンク
概要
- 論文の詳細を見る
In [1] and [2] we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.
- (社)電子情報通信学会の論文
- 2010-02-01
著者
-
Vien Ngo
Artificial Intelligence Laboratory Department Of Computer Engineering Kyung Hee University
-
LEE SeungGwan
Artificial Intelligence Laboratory, Department of Computer Engineering, Kyung Hee University
-
CHUNG TaeChoong
Artificial Intelligence Laboratory, Department of Computer Engineering, Kyung Hee University
-
Chung Taechoong
Artificial Intelligence Laboratory Department Of Computer Engineering Kyung Hee University
-
Lee Seunggwan
Artificial Intelligence Laboratory Department Of Computer Engineering Kyung Hee University
関連論文
- Policy Gradient Based Semi-Markov Decision Problems : Approximation and Estimation Errors
- Policy Gradient SMDP for Resource Allocation and Routing in Integrated Services Networks