Adaptive Subspace Splitting Using Minimum-Description-Length-Principles for Efficient Boosting

概要

論文の詳細を見る
In RealBoost learning, selecting best weak classifiers is one of the most significant tasks. Generally, it is done according to its discriminant power which are normally measured by dissimilarity of distributions of positive and negative samples such as Bhattacharyya distance, Kullback-Leibler divergence, the recent Jensen-Shannon divergence and Informax. These distributions are often estimated through splitting the range of continuous feature values into a predefined number of equal-width intervals and then computing histograms. So far, choosing the most appropriate number of intervals is still a challenging task because a small number of intervals might not well approximate the real distribution while a large number of intervals might cause over-fitting, increase computation time and waste storage space. Therefore, this paper proposes using Minimum-Description-Length-Principles (MDLP) based discretization method for automatically and optimally choosing it. Experiments on the integrating MDLP-based subspace splitting into RealBoost have shown that strong classifiers learned by the proposed method can achieve stable performance, avoid over-fitting and have compact storage space.
社団法人電子情報通信学会の論文
2005-12-09