11912 改良型罰回避政策形成アルゴリズムへの罰基底度閾値決定機構の導入と評価(OS7 ロボティックス・メカトロニクス(3),オーガナイズドセッション)

概要

論文の詳細を見る
Penalty Avoiding Rational Policy Making algorithm (PARP) based on Profit Sharing method and was planed to learn a penalty avoiding policy. PARP is improved to save memories and to cope with uncertainties. The efficiency of the Improved Penalty Avoiding Rational Policy Making algorithm is influenced by threshold of the penalty basis function γ significantly. Up to now, it is necessary to set appropriate γ through a preliminary experiment. In this paper, we propose a technique for learning γ with the multi start method. The proposal technique is applied to a keepaway task that is a benchmark in a robotic soccer game, to confirm the effectiveness.
一般社団法人日本機械学会の論文
2010-03-09