Fast Algorithms for k-Word Proximity Search
スポンサーリンク
概要
- 論文の詳細を見る
When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.
- 社団法人電子情報通信学会の論文
- 2001-09-01
著者
-
Sadakane Kunihiko
The Graduate School If Information Science Tohoku University
-
Imai Hiroshi
The Department Of Information Science The University Of Tokyo
-
Imai Hiroshi
The Department Of Information Science University Of Tokyo
関連論文
- Efficient Algorithms for Constructing a Pyramid from a Terrain(Computational Geometry, Foundations of Computer Science)
- Divergence-Based Geometric Clustering and Its Underlying Discrete Proximity Structures (Special Issue on Surveys on Discovery Science)
- Fast Algorithms for k-Word Proximity Search
- Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting