Analysis of Japanese Compound Nouns by Direct Text Scanning
スポンサーリンク
概要
- 論文の詳細を見る
Compound nouns tend to be important words because a compound noun conveys a lot of information which can even summarize a document. Therefore the analysis of compound nouns can contribute to machine translation, information extraction, or information retrieval. Since compound nouns lack syntactic clues, existing methods have utilized manually written rules and thesauri in order to analyze word dependency structure in compound nouns. Consequently the methods lack robustness in treating open corpora such as newspaper articles which contain a number of unregistered words. This paper presents a thesaurus-free corpus-based approach which scans a corpus with a set of templates and extracts co-occurrence data of the nouns which construct the compound noun. Unregistered words such as abbreviations and short compound nouns are detected in the process of template-matching and the co-occurrence data of the newly found words are additionally extracted, which leads to the robustness and high accuracy of the analysis. The accuracy of the methodwas evaluated using 400 compound nouns of length 5, 6, 7, and 8. The numbers of the correct analysis were 90, 86, 84, and 84 in 100 compound nouns of length 5, 6, 7, and 8 respectively.
- 言語処理学会の論文
言語処理学会 | 論文
- 複合語の分野連想語の効率的決定法
- クラス指向事例収集手法による言い換えコーパスの構築
- 動詞項構造辞書への大規模用例付与
- 言い換え技術に関する研究動向
- Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation