Automatic Extraction of Japanese Probabilistic Context Free Grammar From a Bracketed Corpus
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we describe a method to extract a probabilistic context free grammar of Japanese from a bracketed corpus. To extract grammar rules, we assign appropriate non-terminal symbols to the intermediate nodes of the bracketed trees by taking account of the heads of phrases. We estimate the probabilities of the rules based on their frequency of occurrence. We also propose several improvements to the extracted grammar. The size of the grammar is reduced by removing any redundant rules. The number of the parse tree is reduced (1) by allowing only a right linear binary branching tree for a constituent that consists of items of the same POS, (2) by subcategorizing the POSs "symbol" ("KIGOU") and "postposition" ("JOSI"), and (3) by assigning a consistent structure to constructs representing clausal modality. Finally, we conducted an experiment that evaluated the proposed methods. 2, 219 grammar rules were extracted from about 180, 000 sentences. When we analyzed 20, 000 test sentences with the extracted grammar, a 92% acceptance rate was calculated, showing that the grammar has a broad coverage. For the most probable 30 parse trees, we obtained a 62% brackets recall, 74% brackets precision and 29% sentence accuracy.
- 言語処理学会の論文
言語処理学会 | 論文
- 複合語の分野連想語の効率的決定法
- クラス指向事例収集手法による言い換えコーパスの構築
- 動詞項構造辞書への大規模用例付与
- 言い換え技術に関する研究動向
- Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation