部分観測環境での強化学習へのモデルベースアプローチ : 可変長記憶モデルのベイズ学習

概要

論文の詳細を見る
Most of the reinforcement learning (RL) algorithms assume that the learning processes of embedded agents can be formulated as Markov Decision Processes (MDPs). However, the assumption is not valid for many realistic problems. Therefore, research on RL techniques for non-Markovian environments is gaining more attention recently. We have developed a Bayesian approach to RL in non-Markovian environments, in which the environment is modeled as a history tree model, a stochastic model with variable memory length. In our approach, given a class of history trees, the agent explores the environment and learns the maximum a posteriori (MAP) model on the basis of Bayesian Statistics. The optimal policy can be computed by Dynamic Programming, after the agent has learned the environment model. Unlike many other model learning techniques, our approach does not suffer from the problems of noise and overfitting, thanks to the Bayesian framework. We have analyzed the asymptotic behavior of the proposed algorithm and have proved that if the given class contains the exact model of the environment, the model learned by our algorithm converges to it. We also present the results of our experiments in two non-Markovian environments.
社団法人人工知能学会の論文
1998-05-01