Dealing with Imperfections in the Data for Propositional Learning(Special Issue:"Doctorial Theses on Artificial Intelligence")
スポンサーリンク
概要
- 論文の詳細を見る
Propositional learning algorithms successfully extract concepts from data expressed thru attributes and propositions on the attribute values. However, When the concepts acquired by propositional learning algorithms are to be used at later stages, on previously unseen instances from the same designated domain, there is no absolute confidence that the concept learned will fit all the new instances. One of the reasons behind the failure of propositional learning to achieve onehundred percent accuracy on unseen objects, is that actual domains show deficiencies, like missing values, in the quality of the available data. When missing values occur in the data, the propositional learning algorithms produce concepts that classify unseen objects with less accuracy than the classification accuracy obtained by concepts learned from complete data. This dissertation formulates a new mechanism for missing value estimation which can help to reduce the error rate of the learned concepts. The contribution formulated in this dissertationis a cascaded and ordered construction of the decision trees employed to fill the missing values. This variation on the way to obtain what is hereby named Attribute Trees was empirically shown to improve the accuracy of the decision tree learning algorithm on the completed data, while holding the computational cost to an acceptable level. Though the method was successful on increasing accuracy in some domains with artificially introduced missing values, more extensive experimentation revealed that the improvement on accuracy performance is not the general case. Thus, this dissertation additionally tries to identify the reasons for the lack of general applicability of the proposed method. A new analytical tool called mutual information analysis is defined and used for identifying domains that are compatible with the attribute trees approach for filling missing values. As a result, the sort of data for which it is worth to apply the new method was determined.
- 社団法人人工知能学会の論文
- 2000-11-01