Reinforcement Learning Based on Intrinsic Motivation and Temporal Abstraction via Transformation Invariance

概要

論文の詳細を見る
Bottom-up processes have received much attention in unsupervised and developmental learning research domain. In contrast, effectiveness of top-down deeming on acquisition of adaptive behavior is discussed in this paper. Successful experience in the past, or a skill that could be expected to be reused successfully in a novel environment is stored in memory. Then abstract environment recognition via geometric transformation invariance is introduced to measure the reproducibility of executed skill in a novel environment. Additionally, reproducibility of skill in the environment is utilized to make up intrinsic motivation that drives the agent to active conceptualization of search space. It enables the agent to relativize current skill execution robustly in diverse environments. Useful characteristics of top-down deeming process are implemented on reinforcement learning and discussed through simulation experiments in grid world. The results demonstrate acceleration of learning progress by active conceptualization of environment. Additionally, it is shown by experiments for scaled environment that subjective anticipation could bring in consistent strategy of exploration and exploitation. Eligibility trace is also introduced for skill utility problem and it is shown that the traces regarding actions and skills could preserve learning performance for diverse skill settings.