Improving Text Categorization with Semantic Knowledge in Wikipedia
スポンサーリンク
概要
- 論文の詳細を見る
Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with "Bag of Words (BOW)" text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.
- The Institute of Electronics, Information and Communication Engineersの論文
著者
-
ZHOU Bin
School of Chemical Engineering, Nanjing University of Science and Technology
-
ZHOU Bin
School of Computer, National University of Defense Technology
-
WANG Xiang
School of Computer, National University of Defense Technology
-
FAN Hua
School of Computer, National University of Defense Technology
-
JIA Yan
School of Computer, National University of Defense Technology
-
CHEN Ruhua
School of Computer, National University of Defense Technology
関連論文
- Zr
- Efficient RFID Data Cleaning in Supply Chain Management
- Investigations on semiconductor bridge initiators with semiconductor arrester chip used to ESD protection
- Improving Text Categorization with Semantic Knowledge in Wikipedia
- Zr₀.₀₅Ti₀.₉₅O₂-Nanowire-Based Ultraviolet Photodetector with Self-Powered Properties