News Relation Discovery Based on Association Rule Mining with Combining Factors
スポンサーリンク
概要
- 論文の詳細を見る
Recently, to track and relate news documents from several sources, association rule mining has been applied due to its performance and scalability. This paper presents an empirical investigation on how term representation basis, term weighting, and association measure affects the quality of relations discovered among news documents. Twenty four combinations initiated by two term representation bases, four term weightings, and three association measures are explored with their results compared to human judgment of three-level relations: completely related, somehow related, and unrelated relations. The performance evaluation is conducted by comparing the top-k results of each combination to those of the others using so-called rank-order mismatch (ROM). The experimental results indicate that a combination of bigram (BG), term frequency with inverse document frequency (TFIDF) and confidence (CONF), as well as a combination of BG, TFIDF and conviction (CONV), achieves the best performance to find the related documents by placing them in upper ranks with 0.41% ROM on top-50 mined relations. However, a combination of unigram (UG), TFIDF and lift (LIFT) performs the best by locating irrelevant relations in lower ranks (top-1100) with 9.63% ROM. A detailed analysis on the number of the three-level relations with regard to their rankings is also performed in order to examine the characteristic of the resultant relations. Finally, a discussion and an error analysis are given.
- 2011-03-01
著者
-
Theeramunkong Thanaruk
School Of Information And Computer Technology Siit Thammasat University
-
NANTAJEEWARAWAT Ekawit
School of Information and Computer Technology Sirindhorn International Institute of Technology, Tham
-
Nantajeewarawat Ekawit
School Of Information Computer And Communication Technology Sirindhorn International Institute Of Te
-
Theeramunkong Thanaruk
School Of Information Computer And Communication Technology Sirindhorn International Institute Of Te
-
KITTIPHATTANABAWON Nichnan
School of Information, Computer and Communication Technology, Sirindhorn International Institute of
-
Kittiphattanabawon Nichnan
School Of Information Computer And Communication Technology Sirindhorn International Institute Of Te
関連論文
- News Relation Discovery Based on Association Rule Mining with Combining Factors
- Extracting Chemical Reactions from Thai Text for Semantics-Based Information Retrieval
- Extracting Semantic Frames from Thai Medical-Symptom Unstructured Text with Unknown Target-Phrase Boundaries
- Special Section on Knowledge, Information and Creativity Support System
- Quality Evaluation for Document Relation Discovery Using Citation Information(Data Mining)
- Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts