CHQ : A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes(Artificial Intelligence and Cognitive Science)
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we propose a new reinforcement learning scheme called CHQ that could efficiently acquire appropriate policies under partially observable Markov decision processes (POMDP) involving probabilistic state transitions, that frequently occurs in multi-agent systems in which each agent independently takes a probabilistic action based on a partial observation of the underlying environment. A key idea of CHQ is to extend the HQ-learning proposed by Wiering et al. in such a way that it could learn the activation order of the MDP subtasks as well as an appropriate policy under each MDP subtask. The goodness of the proposed scheme is experimentally evaluated. The result of experiments implies that it can acquire a deterministic policy with a sufficiently high success rate, even if the given task is POMDP with probabilistic state transitions.
- 社団法人電子情報通信学会の論文
- 2005-05-01
著者
-
FUJITA Satoshi
Graduate School of Engineering, Hiroshima University
-
Fujita Satoshi
Hiroshima Univ. Higashihiroshima‐shi Jpn
-
OSADA Hiroshi
Takahama Plant Logistics Engineering Center, Toyota Industries Corporation
-
Fujita Satoshi
Graduate School Of Engineering Hiroshima University
-
Osada Hiroshi
Takahama Plant Logistics Engineering Center Toyota Industries Corporation
-
Fujita Satoshi
Graduate School Of Engineering Faculty Of Engineering Hiroshima University
関連論文
- A Fault-Tolerant Content Addressable Network(Networks)
- Distributed Zone Partitioning Schemes for CAN and Its Application to the Load Balancing in Pure P2P Systems (特集 新時代の分散処理とネットワーク(WebサービスとP2P))
- Semi-Dynamic Multiprocessor Scheduling with an Asymptotically Optimal Performance Ratio
- An Efficient Scheduling Scheme for Assigning Transmission Opportunity in QoS-Guaranteed Wireless LAN
- A Localization Scheme for Sensor Networks Based on Wireless Communication with Anchor Groups(Challenges in Ad-hoc and Multi-hop Wireless Communications)
- Collision Avoidance of Multiple Autonomous Mobile Robots Using Learning
- CHQ : A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes(Artificial Intelligence and Cognitive Science)
- On Some Computational Aspect of Point Configurations in the Enclidean Space
- An Efficient Scheduling Scheme for Assigning Transmission Opportunity in QoS-Guaranteed Wireless LAN
- A Generic Solver Based on Functional Parallelism for Solving Combinatorial Optimization Problems(Distributed Cooperation and Agents)
- SwRED: a robust active queue management scheme based on load level prediction (情報ネットワーク)
- A New Caching Technique to Support Conjunctive Queries in P2P DHT
- Semi-Dynamic Multiprocessor Scheduling with an Asymptotically Optimal Performance Ratio
- A Greedy Multicast Algorithm in ★-Ary n-Cubes and Its Worst Case Analysis (Special Issue on Selected Papers from LA Symposium)
- Importance of intracellular Fe pools on growth of marine diatoms by using unialgal cultures and on the Oyashio region phytoplankton community during spring
- Prevent Contents Leaking in P2P CDNs with Robust and Quick Detection of Colluders
- Autonomous Multi-Source Multi-Sink Routing in Wireless Sensor Networks
- Autonomous Multi-Source Multi-Sink Routing in Wireless Sensor Networks
- Special Section on Discrete Mathematics and Its Applications
- Reputation-Based Colluder Detection Schemes for Peer-to-Peer Content Delivery Networks
- A Reputation Management Scheme for Peer-to-Peer Networks based on the EigenTrust Trust Management Algorithm
- Distributed Zone Partitioning Schemes for CAN and Its Application to the Load Balancing in Pure P2P Systems
- Distributed Zone Partitioning Schemes for CAN and Its Application to the Load Balancing in Pure P2P Systems