Extracting Partial Parsing Rules from Tree-Annotated Corpus : Toward Deterministic Global Parsing(Natural Language Processing)
スポンサーリンク
概要
- 論文の詳細を見る
It is not always possible to find a global parse for an input sentence owing to problems such as errors of a sentence, incompleteness of lexicon and grammar. Partial parsing is an alternative approach to respond to these problems. Partial parsing techniques try to recover syntactic information efficiently and reliably by sacrificing completeness and depth of analysis. One of the difficulties in partial parsing is how the grammar might be automatically extracted. In this paper we present a method of automatically extracting partial parsing rules from a tree-annotated corpus using the decision tree method. Our goal is deterministic global parsing using partial parsing rules, in other words, to extract partial parsing rules with higher accuracy and broader expansion. First, we define a rule template that enables to learn a subtree for a given substring, so that the resultant rules can be more specific and stricter to apply. Second, rule candidates extracted from a training corpus are enriched with contextual and lexical information using the decision tree method and verified through cross-validation. Last, we underspecify non-deterministic rules by merging substructures with ambiguity in those rules. The learned grammar is similar to phrase structure grammar with contextual and lexical information, but allows building structures of depth one or more. Thanks to automatic learning, the partial parsing rules can be consistent and domain-independent. Partial parsing with this grammar processes an input sentence deterministically using longest-match heuristics, and recursively applies rules to an input sentence. The experiments showed that the partial parser using automatically extracted rules is not only accurate and efficient but also achieves reasonable coverage for Korean.
- 社団法人電子情報通信学会の論文
- 2005-06-01
著者
-
CHOI Key-Sun
Dept. of Computer Science, KAIST
-
Choi Key‐sun
Kaist Daejeon Kor
-
Choi Key-sun
Dept. Of Computer Science Kaist
-
LEE Kong-Joo
Dept. of CSE, Ewha Womans Univ.
-
CHOI Myung-Seok
Dept. of EECS, KAIST
-
KIM Gil
Dept. of EECS, KAIST
-
Kim Gil
Dept. Of Eecs Kaist
-
Lee Kong-joo
Dept. Of Information & Communication Engineering Chungnam Nat'l Univ.
-
Lee Kong-joo
Dept. Of Cse Ewha Womans Univ.
-
Choi Myung-seok
Dept. Of Eecs Kaist
関連論文
- An Alignment Model for Extracting English-Korean Translations of Term Constituents(Natural Language Processing)
- Normalizing Syntactic Structure Using Part-of-Speech Tags and Binary Rules( Development of Advanced Computer Systems)
- Extracting Partial Parsing Rules from Tree-Annotated Corpus : Toward Deterministic Global Parsing(Natural Language Processing)
- Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information(Natural Language Processing)
- Improving Automatic English Writing Assessment Using Regression Trees and Error-Weighting