複利型強化学習

スポンサーリンク

概要

論文の詳細を見る
This paper describes a reinforcement learning framework based on compound returns, which is called compound reinforcement learning. Compound reinforcement learning maximizes the compound return in returns-based MDPs. We also describe compound Q-learning algorithm. We present experimental results using an ilustrative example, 2-armed bandit.

著者

松井藤五郎
中部大学生命健康科学部, 工学部

関連論文

スポンサーリンク