New Indices for Japanese Text:A New Word-based Index of Non-segmented Text for Fast Full-text-search Systems(<特集>Special Issue on Generation Database Technology for Internet, Multimedia and Mobile computing)
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we propose a new type of word-based index, the complete maximal word index, which is suitable for text of continuous sequence languages such as Japanese and Chinese. This index solves the problems encountered in applying the word-based index file method to a fulltext retrieval system for such types of text. The proposed word-based index ensures retrieval with no false dismissals for arbitrary search strings and very fast retrieval even for longer search strings, while its size is small in relation to other types of index such as the n-gram-based index. A formal definition of the complete maximal word index is given, and its generating algorithm and searching algorithm are described. Some experimental results are presented to demonstrate that this approach is in fact effective and practical. The proposed index is also promising in that it can be naturally incorporated into inexact-match retrieval models such as probabilistic and vector space model, because it contains word-based information such as word frequencies and word distributions.
- 一般社団法人情報処理学会の論文
- 1998-04-15
著者
-
Kanno Yuji
Multimedia Systems Research Laboratory Matsushita Electric Industrial Co. Ltd.
-
NOGUCHI NAOHIKO
Multimedia Systems Research Laboratory, Matsushita Electric Industrial Co., Ltd.
-
INABA MITSUAKI
Multimedia Systems Research Laboratory, Matsushita Electric Industrial Co., Ltd.
-
KURACHI KAZUAKI
Systems Integration Business Center, Matsushita Electric Industrial Co., Ltd.
-
Inaba Mitsuaki
Multimedia Systems Research Laboratory Matsushita Electric Industrial Co. Ltd.
-
Kurachi Kazuaki
Systems Integration Business Center Matsushita Electric Industrial Co. Ltd.
-
Noguchi Naohiko
Multimedia Systems Research Laboratory Matsushita Electric Industrial Co. Ltd.