Feature Selection and Integration in Automatic Classification of Japanese Texts
スポンサーリンク
概要
- 論文の詳細を見る
We explore the problem of automatic text clas-sification using Japanese documents. Unlike other languagesthat use roman letters, Japanese language poses a problem ofnot availing word boundary information. As such bag-of-wordsapproach in constructing features may not be sufficient to en-hance machine learning techniques. We propose a method forfeature selection and construction to improve automatic clas-sification performance of Japanese texts. Our approach in-volve extracting syntactic word categories and Chinese charac-ters (Kanji) separately. Then we combine the extracted infor-mation to build an informative feature set. We carried out var-ious experiments using four learning algorithms to evaluate itseffectiveness. The proposed method generally outperformed itscounterparts method for Japanese document representation.
論文 | ランダム
- "Riddle Jokes"の特徴とその種類
- ジョークにおける意味理解の過程についての一仮説
- ″Poetic Jokes″ における分節音素分析
- 培養液・媒精時間による胚発育への影響
- P2-260 腹腔鏡検査時採取によるヒト体外成熟(IVM)-ICSI-ET治療 : 新鮮胚移植と凍結融解胚移植の比較(Group 147 不妊・不育IX,一般演題,講演要旨,第58回日本産科婦人科学会学術講演会)