The Asymptotic Equipartition Property in Reinforcement Learning and its Relation to Return Maximization
スポンサーリンク
概要
- 論文の詳細を見る
We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.
- Elsevierの論文
- 2006-01-01
著者
-
SAKAI Hideaki
Department of Applied Physics, University of Tokyo
-
Iwata Kazunori
The Department Of Systems Science Graduate School Of Informatics Kyoto University
-
Ikeda Kazushi
The Department Of Systems Science Graduate School Of Informatics Kyoto University
-
Ikeda Kazushi
Department Of Systems Science Graduate School Of Informatics Kyoto University
-
Sakai Hideaki
The Department Of Systems Science Graduate School Of Informatics Kyoto University
-
Sakai H
Kyoto Univ. Kyoto‐shi Jpn
-
Ikeda K
Kyoto Univ. Kyoto‐shi Jpn
-
IWATA Kazunori
Faculty of Information Sciences, Hiroshima City University
-
Sakai Hideaki
Department Of Applied Mathematics And Physics Faculty Of Engineering Kyoto University
関連論文
- Dopant-Dependent Impact of Mn-Site Doping on Critical-State Manganites R_Sr_MnO_3 (R=La, Nd, Sm, and Gd)(Condensed matter: electronic structure and electrical, magnetic, and optical properties)
- Fluorescence Microscopic Demonstration of Cathepsin K Activity as the Major Lysosomal Cysteine Proteinase in Osteoclasts
- A Case in Which Stent Insertion is Considered to Have Triggered Contrast Medium-Induced Coronary Vasospasm
- Characterization of Phagosomal Subpopulations along Endocytic Routes in Osteoclasts and Macrophages
- Increased Circulating Levels of Insulin-Like Growth Factor-I and Decreased Circulating Levels of Insulin-Like Growth Factor Binding Protein-1 in Postmenopausal Women with Endometrial Cancer
- The Regulation of Bone Resorption in Tooth Formation and Eruption Processes in Mouse Alveolar Crest Devoid of Cathepsin K
- Changes in Expression of Osteocalcin,Matrix Gla Protein and Osteopontin mRNAs during Experimental Tooth Movement in Rats
- Antigenicity of pro-osteocalcin in hard tissue: the authenticity to visualize osteocalcin-producing cells
- Rapid Degradation of Cathepsin E in Pre-Golgi Compartments after Tunicamycin Treatment
- Processing of NH_2- and COOH-terminal peptides of rat osteocalcin by cathepsin B and cathepsin L
- Processing of Osteocalcin by Cysteine Proteinases Alters the Intrinsic Calcium Binding Capability
- Biochemical Properties of the Monomeric Mutant of Human Cathepsin E Expressed in Chinese Hamster Ovary Cells: Comparison with Dimeric Forms of the Natural and Recombinant Cathepsin E
- Propagation Characteristics of Boolean Functions and Their Balancedness
- Blind Identification of Multichannel Systems by Exploiting Prior Knowledge of the Channel (Special Section on Digital Signal Processing)
- Studies on the Convergence Speed of Over-Sampled Subband Adaptive Digital Filters(Special Section on Digital Signal Processing)
- The Asymptotic Equipartition Property in Reinforcement Learning and its Relation to Return Maximization
- On the Effects of Domain Size and Complexity in Empirical Distribution of Reinforcement Learning(Artificial Intelligence and Cognitive Science)
- A Case of Transverse Colon Cancer Secondarily Involving the Liver, Duodenum, and Pancreas
- A synfire chain in layered coincidence detectors with random synaptic delays
- Recent Trends in Random Data Analysis
- Dopant-Dependent Impact of Mn-Site Doping on Critical-State Manganites R_Sr_MnO_3 (R=La, Nd, Sm, and Gd)
- Fluorescence Microscopic Demonstration of Cathepsin K Activity as the Major Lysosomal Cysteine Proteinase in Osteoclasts.