Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment
スポンサーリンク
概要
- 論文の詳細を見る
Reinforcement Learning(RL) is an efficient method for solving Markov Decision Processes(MDPs) without a priori knowledge about an environment, and can be classified into the exploitation oriented method and the exploration oriented method. Q-learning is a representative RL and is classified as an exploration oriented method. It is guaranteed to obtain an optimal policy, however, Q-learning needs numerous trials to learn it because there is not action-selecting mechanism in Q-learning. For accelerating the learning rate of the Q-learning and realizing exploitation and exploration at a learning process, the Q-ee learning system has been proposed, which uses pre-action-selector, action-selector and back propagation of Q values to improve the performance of Q-learning. But the Q-ee learning is merely suitable for deterministic MDPs, and its convergent guarantee to derive an optimal policy has not been proved. In this paper, based on discussing different exploration methods, replacing the pre-action-selector in the Q-ee learning, we introduce a method that can be used to implement an active exploration to an environment, the Active Exploration Planning(AEP), into the learning system, which we call the Q-ae learning. With this replacement, the Q-ae learning not only maintains advantages of the Q-ee learning but also is adapted to a stochastic environment. Moreover, under deterministic MDPs, this paper presents the convergent condition and its proof for an agent to obtain the optimal policy by the method of the Q-ae learning. Further, by discussions and experiments, it is shown that by adjusting the relation between the learning factor and the discounted rate, the exploration process to an environment can be controlled on a stochastic environment. And, experimental results about the exploration rate to an environment and the correct rate of learned policies also illustrate the efficiency of the Q-ae learning on the stochastic environment.
- 社団法人電子情報通信学会の論文
- 2000-09-25
著者
-
TATSUMI Shoji
Faculty of Engineering, Osaka City University
-
Tatsumi Shoji
Faculty Of Engineering Osaka City University
-
Tatsumi S
Osaka City Univ. Osaka‐shi Jpn
-
Zhao Gang
Fujitsu Kansai-chubu Net-tech Limited
-
Zhao Gang
National Astronomical Observatories Chinese Acad. Of Sci. Beijing Chn
-
SUN Ruoying
Faculty of Engineering, Osaka City University
-
ZHAO Gang
Faculty of Engineering, Osaka City University
-
SUN Ruoying
College of Industry and Commerce Management, Liaoning University
-
Sun Ruoying
Faculty Of Engineering Osaka City University
関連論文
- Parallel Genetic Algorithm for Constrained Clustering
- Parallel Genetic Algorithms Based on a Multiprocessor System FIN and Its Application
- Boltzmann Machine and Parallel Genetic Algorithms Based on the Fin
- A PARALLEL IMPLEMENTATION OF THE LEARNING CLASSIFIER SYSTEMS ON THE FIN-1
- Substellar Companions to Evolved Intermediate-Mass Stars : HD 145457 and HD 180314
- Detection of Small-Amplitude Oscillations in the G-Giant HD 76294 (ζ Hydrae)
- Calculation of Photoionized Plasmas with a Detailed-Configuration-Accounting Atomic Model
- Multiagent Cooperating Learning Methods by Indirect Media Communication(Neural Netoworks and Bioengineering)
- On the Spectroscopic Determination of Atmospheric Parameters and O/Fe Abundances of RR Lyrae Stars
- Multiagent Cooperating Learning Methods by Indirect Media Communication
- Na I D Lines in the SN 2002ap Spectrum
- On the Abundance of Potassium in Metal-Poor Stars
- α Element Abundances in Mildly Metal-Poor Stars
- Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment
- RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate
- Q-ee Learning : A Novel Q-Learning Method with Exploitation and Exploration
- An Accelerated k-Certainty Exploration Method
- Electron Impact Excitation of Ti XVIII
- Applying Genetic Algorithm to Conceptual Clustering
- Algorithms for Matrix Multiplication and the FFT on a Processor Array with Separable Buses(Regular Section)
- Solving an All-Pairs Shortest Paths Problem on a Processor Array with Separable Buses
- A Pattern Defect Inspection Method by Grayscale Image Comparison without Precise Image Alignment
- Electron Impact Excitation of N-like Ca XIV