Efficient Retrieval of Labeled Binary Trees

概要

論文の詳細を見る
Generalizations of the classical ngram indexing technique provide a powerful tool for the fast retrieval of tree structures. We investigate the special case of binary trees, which are heavily used, for example, in computer linguistics: Given a database T = {t_1, ..., t_n} of labeled binary trees and a query tree q, find efficiently all trees t ∈ T that contain q as subtree. For supporting queries of this type, we propose an indexing technique that covers the database trees t ∈ T with smaller trees of a fixed class. Thus, each index record represents a subtree that is contained in at least one database tree. To answer a given query q, the database trees of T that contain all of q's cover trees are preselected as candidates, which in turn are tested rigorously for containment of q. We present results from two test suites : one with databases of 10,000 randomly generated binary trees each and one with the 29,394 most extensive phrase structure trees found in the morphosyntactical analysis of the Old Testament's Hebrew texts Genesis, Exodus, and Leviticus.
一般社団法人電子情報通信学会の論文
1995-11-25