Two Step POS Selection for SVM Based Text Categorization(<Special Section>Information Processing Technology for Web Utilization)
スポンサーリンク
概要
- 論文の詳細を見る
lthough many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the over filling problem that SVM somelimes selected unsuitable support vectors for each category in the training set To avoid the over-filling problem, we propose a two step text categorization method with a variable cascaded feature selection (VCFS) using SVM. VCFS method selecls a pair of the best number of words and the best POS combination for each category at each step of the cascade. We made use of the difference of words with the highest mutual information for each category on each POS combination. Through the experiments, we confirmed the validation of VCFS method compared with other SVM based texl categorization melhods, since our results showed that the macro-averaged F_1 measure (64.8%) of VCFS method was significantly better than any reported F_1 measures, though the micro-averaged F_1 measure (85.4%) of VCFS method was similar to them.
- 社団法人電子情報通信学会の論文
- 2004-02-01
著者
-
Masuyama Takeshi
Information Technology Center The University Of Tokyo
-
NAKAGAWA Hiroshi
Information Technology Center, The University of Tokyo
-
Nakagawa Hiroshi
Information Technology Center The University Of Tokyo
関連論文
- Two Step POS Selection for SVM Based Text Categorization(Information Processing Technology for Web Utilization)
- Spectral Methods for Thesaurus Construction