A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES
スポンサーリンク
概要
- 論文の詳細を見る
This study is concerned with finite Markov decision processes (MDPs) whose state are exactly observable but its transition matrix is unknown. We develop a learning algorithm of the reward-penalty type for the communicating case of multi-chain MDPs. An adaptively optimal policy and an asymptotic sequence of adaptive policies with nearly optimal properties are constructed under the average expected reward criterion. Also, a numerical experiment is given to show the practical effectiveness of the algorithm.
- Research Association of Statistical Sciencesの論文
- 2007-12-00
Research Association of Statistical Sciences | 論文
- CLUSTERING BY A FUZZY METRIC : APPLICATIONS TO THE CLUSTER-MEDIAN PROBLEM
- A FAMILY OF REGRESSION MODELS HAVING PARTIALLY ADDITIVE AND MULTIPLICATIVE COVARIATE STRUCTURE
- AN OPTIMAL STOPPING PROBLEM ON TREE
- ON THE ORDERS OF MAX-MIN FUNCTIONALS
- TREE EXPRESSIONS AND THEIR PRODUCT FORMULA