A statistical property of multiagent learning based on Markov decision process
スポンサーリンク
概要
- 論文の詳細を見る
We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.
- IEEEの論文
IEEE | 論文
- Magnetic and Transport Properties of Nb/PdNi Bilayers
- Supersonic Ion Beam Driven by Permanent-Magnets-Induced Double Layer in an Expanding Plasma
- Surfactant Adsorption on Single-Crystal Silicon Surfaces in TMAH Solution: Orientation-Dependent Adsorption Detected by In Situ Infrared Spectroscopy
- Extended-range FMCW reflectometry using an optical loop with a frequency shifter
- Teachingless spray-painting of sculptured surface by an industrial robot