Effects of Term Distributions on Binary Classification(<Special Section>Knowledge, Information and Creativity Support System)
スポンサーリンク
概要
- 論文の詳細を見る
In order to support decision making, text classification is an important tool. Recently, in addition to term frequency and inverse document frequency, term distributions have been shown to be useful to improve classification accuracy in multi-class classification. This paper investigates the performance of these term distributions on binary classification using a centroid-based approach. In such one-against-the-rest, there are only two classes, the positive (focused) class and the negative class. To improve the performance, a so-called hierarchical EM method is applied to cluster the negative class, which is usually much larger and more diverse than the positive one, into several homogeneous groups. The experimental results on two collections of web pages, namely Drug Information (DI) and WebKB, show the merits of term distributions and clustering on binary classification. The performance of the proposed method is also investigated using the Thai Herbal collection where the texts are written in Thai language.
- 社団法人電子情報通信学会の論文
- 2007-10-01
著者
-
Theeramunkong Thanaruk
Information And Computer Technology School Sirindhorn International Institute Of Technology Thammasa
-
Theeramunkong Thanaruk
Information Technology Program Sirindhorn International Institute Of Technology Thammasat University
-
LERTNATTEE Verayuth
Faculty of Pharmacy, Silpakorn University
-
Lertnattee Verayuth
Faculty Of Pharmacy Silpakorn University
関連論文
- A Family-Based Evolutional Approach for Kernel Tree Selection in SVMs
- Kernel Trees for Support Vector Machines(Knowledge, Information and Creativity Support System)
- Pattern-Based Features vs. Statistical-Based Features in Decision Trees for Word Segmentation(Natural Language Processing)
- Speech Clarity Index (Ψ) : A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy
- A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques
- Effects of Term Distributions on Binary Classification(Knowledge, Information and Creativity Support System)