Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation

概要

論文の詳細を見る
Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.

著者

Hachiya Hirotaka
Department Of Computer Science Tokyo Institute Of Technology
SUGIYAMA Masashi
Department of Computer Science, Tokyo Institute of Technology
KASHIMA Hisashi
Department of Mathematical Informatics, the University of Tokyo
MORIMURA Tetsuro
IBM Research - Tokyo

関連論文

Recent Advances and Trends in Large-Scale Kernel Methods
Statistical active learning for efficient value function approximation in reinforcement learning (ニューロコンピューティング)
Improving the Accuracy of Least-Squares Probabilistic Classifiers
Improving the Accuracy of Least-Squares Probabilistic Classifiers
Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation
A New Meta-Criterion for Regularized Subspace Information Criterion
Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise
Multi-task learning with least-squares probabilistic classifiers (パターン認識・メディア理解)
Multi-task learning with least-squares probabilistic classifiers (情報論的学習理論と機械学習)
Adaptive importance sampling with automatic model selection in value function approximation (ニューロコンピューティング)
Analytic Optimization of Adaptive Ridge Parameters Based on Regularized Subspace Information Criterion(Neural Networks and Bioengineering)
Adaptive Ridge Learning in Kernel Eigenspace and Its Model Selection
Syntheses of New Artificial Zinc Finger Proteins Containing Trisbipyridine-ruthenium Amino Acid at The N-or C-terminus as Fluorescent Probes
Analytic Optimization of Shrinkage Parameters Based on Regularized Subspace Information Criterion(Neural Networks and Bioengineering)
Constructing Kernel Functions for Binary Regression(Pattern Recognition)
Information-maximization clustering: analytic solution and model selection (情報論的学習理論と機械学習)
Cartesian Kernel : An Efficient Alternative to the Pairwise Kernel
New feature selection method for reinforcement learning: conditional mutual information reveals implicit state-reward dependency (情報論的学習理論と機械学習)
Least Absolute Policy Iteration-A Robust Approach to Value Function Approximation
Adaptive importance sampling with automatic model selection in reward weighted regression (ニューロコンピューティング)
Analysis and improvement of policy gradient estimation (情報論的学習理論と機械学習)
Artist agent A[2]: stroke painterly rendering based on reinforcement learning (パターン認識・メディア理解)
Artist agent A[2]: stroke painterly rendering based on reinforcement learning (情報論的学習理論と機械学習)
Modified Newton Approach to Policy Search (情報論的学習理論と機械学習)
Computationally Efficient Multi-Label Classification by Least-Squares Probabilistic Classifier (情報論的学習理論と機械学習)
Relative Density-Ratio Estimation for Robust Distribution Comparison (情報論的学習理論と機械学習)
Modified Newton Approach to Policy Search
Squared-loss Mutual Information Regularization
Computationally Efficient Multi-Label Classification by Least-Squares Probabilistic Classifier
Feature Selection via l_1-Penalized Squared-Loss Mutual Information
Relative Density-Ratio Estimation for Robust Distribution Comparison
Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration (情報論的学習理論と機械学習)
Clustering Unclustered Data : Unsupervised Binary Labeling of Two Datasets Having Different Class Balances
Squared-loss Mutual Information Regularization

Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation

スポンサーリンク

概要

著者

関連論文

スポンサーリンク