Hybrid Chinese Term Indexing and the 2-Poisson Model(Special Issue on Text Processing for Information Access)
スポンサーリンク
概要
- 論文の詳細を見る
Retrieval effectiveness depends on both the retrieval model and how terms are extracted and indexed. For Chinese, Japanese and Korea text, there are no spaces to delimit words. Indexing using hybrid terms (i. e. words and bigrams) was found to be effective and efficient using the 2-Poisson model in NTCIR-III open evaluation workshop. Here, we explore another Okapi weight, BM25, based on the 2-Poisson model and compared their performances with bigram and word indexing strategies. Results show that word indexing is the most efficient in terms of indexing time and storage but hybrid term indexing requires the least amount of retrieval time per query. Without pseudo-relevance feedback (PRF), our BM25 appeared to yield better retrieval effectiveness performance for short queries. With PRF, our implementation of the BM11 weights, which are a simplified version of BM25, with hybrid term indexing remains the most effective combination for retrieval in this study.
- 社団法人電子情報通信学会の論文
- 2003-09-01
著者
-
Luk Robert
Department Of Computing Hong Kong Polytechnic University
-
WONG Kam
Department of System Engineering and Engineering Management, Chinese University of Hong Kong
-
Wong Kam
Department Of System Engineering And Engineering Management Chinese University Of Hong Kong