Labeling Q-Learning in POMDP Environments
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents a new Reinforcement Learning (RL) method, called "Labeling Q-learning (LQ-learning)," to solve the partially obervable Markov Decision Process (POMDP) problems. Recently, hierarchical RL methods are widely studied. However, they have the drawback that the learning time and memory are exhausted only for keeping the hierarchical structure, though they wouldn't be necessary. On the other hand, our LQ-learning has no hierarchical structure, but adopts a new type of internal memory mechanism. Namely, in the LQ-learning, the agent percepts the current state by pair of observation and its label, and then, the agent can distinguish states, which look as same, but obviously different, more exactly. So to speak, at each step t, we define a new type of perception of its environment o_t = (o_t, ★_t), where o_t is conventional observation, and ★_t is the label attached to the observation o_t. Then the classical RL-algorithm is used as if the pair (o_t, ★_t) serves as a Markov state. This labeling is carried out by a Boolean variable, called "CHANGE," and a hash-like or mod function, called Labeling Function (LF). In order to demonstrate the efficiency of LQ-learning, we will apply it to "maze problems" in Grid- Worlds, used in many literatures as POMDP simulated environments. By using the LQ-learning, we can solve the maze problems without initial knowledge of environments.
- 社団法人電子情報通信学会の論文
- 2002-09-01
著者
-
Abe Kenichi
The Dept. Of Elec. & Comm. Eng. School Of Eng. Tohoku Univ.
-
LEE Haeyeon
the Dept. of Elec. & Comm. Eng., School of Eng., Tohoku Univ.
-
KAMAYA Hiroyuki
the Dept. of Electrical Eng., Hachinohe National College of Technoligy
-
Lee Haeyeon
The Dept. Of Elec. & Comm. Eng. School Of Eng. Tohoku Univ.
-
Kamaya Hiroyuki
The Dept. Of Electrical Eng. Hachinohe National College Of Technoligy