RF-005 確率環境下での優先度スイープ(人工知能・ゲーム,査読付き論文)

概要

論文の詳細を見る
Model based learning can reevaluate the utility of every state, according to a measure of urgency. Prioritized sweeping is a typical algorithm for efficient state updating. In a stochastic environment, a probability distribution can be used to represent the uncertainty of the Q-value caused by probabilistic state transitions or probabilistic rewards. If the expected value of a reward is assumed to be normally distributed, the distribution of the value function at the initial learning stage is approximated by a t-distribution because of equivalence to random sampling from a normal distribution. Confidence intervals calculated from this distribution for each state-action pair represent insufficiently explored states. In this paper, the product of the confidence interval and the Bellman error is used to provide a measure for prioritizing, which takes account of the level of confidence and also yields a measure of urgency. The performance of this approach in the trap domain is examined and compared with that of the ordinary sweeping method. Experimental results indicate that the proposed approach results in a more effective exploration of the state than does the use of conventional sweeping methods.
FIT(電子情報通信学会・情報処理学会)推進委員会の論文
2008-08-20