A Method for Approximating Document Frequency in Top-k Document Retrieval
スポンサーリンク
概要
- 論文の詳細を見る
Top-k document retrieval is essential task for real world applications such as web search and data mining. A new class of indexes derived from suffix array family have been studied in this decade. This indexes are expected to improve efficiency of document retrieval task and support general document retrieval, where documents are not written in natural languages only, on its algorithmic backgrounds. One of main feature of the indexes is indexing all of substrings in document collection. The indexes,therefore, have difficulty on handling document frequency in terms of space. Previous work [3] provided the circumevent method to weight pesudo terms, which is not related to documente frequcency purely. We propsed two methods: to approximate document frequency of terms from the indexes strucuture and to use term count for term wightening. Our main contribution is providing simple methods that can run on undorned indexes. Experimental results show that our methods are great on efficiency and effectiveness trade-off in practical document retrieval task.
- 2014-01-31
著者
関連論文
- 位置情報取得可能なリアルタイム災害情報マップシステム
- キャンパス内リアルタイム災書情報マップ : シナリオベースを中心とした避難経路選択の妥当性の検証より
- A Method for Approximating Document Frequency in Top-k Document Retrieval