タイトル無し
スポンサーリンク
概要
- 論文の詳細を見る
This paper is concerned with an approach to Markovian decision processes with discounted rewards in which the transition probabilities are unknown. The processes are assumed to be finite-state, discrete-time and stationary. The decision rules presented in the paper give a policy in estimating the transition probabilities successively from the viewpoint of the dual control, and the policy leads to an optimal policy. The decision processes may be regarded as a model of learning processes.To begin with, the assurance set with level γ-S(γ), which will represent uncertainty of the transition probability matrix, is introduced, and S(γ) is computed from the estimation of transition probability matrix each time. Then, two learning algorithms based on the S(γ) are considered.In the first algorithm, a max-max optimal policy based on S(γ) is used every time under the fixed γ. It is shown that the algorithm converges to a policy which is regarded to be optimal with a certain probability related to γ. The second algorithm, in which γ is increased successively, uses a probabilistic policy for a more efficient estimation, while the first algorithm uses a deterministic policy. The second algorithm leads to an optimal policy in the sense that the probability of using the optimal policy converges to one.Finally, the paper discusses a simplification of the second algorithm.
- 公益社団法人 計測自動制御学会の論文
公益社団法人 計測自動制御学会 | 論文
- Self-Excited Oscillation of Relay-Type Sampled-Data Feedback Control System
- タイトル無し
- Mold Level Control for a Continuous Casting Machine Using an Electrode-Type Mold-Level Detector
- Assessment and Control of Noise:Pollution by Noise from General Sources
- Information network system and home automation.