Acquisition Method of Unknown Word's Morpheme Dictionary Information Using Word's Juxtapositional Relationships
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes an inference method for acquiring morpheme information of unknown word from a large corpus. The method is comprised of three functions: inferring morpheme's part-of-speech, conjugation type, and conjugation (we call these morpheme attributes in this paper), updating inferred morpheme attributes by probability factors derived from a large corpus, and inferring Japanese language morphemes. The conjunctive relationships between words in a sentence are utilized to infer the morpheme attributes of unknown word. Since a Japanese sentence is a sequence of characters without any blank spaces to mark word boundaries, our system had to be able to identify word boundaries. To do this, it first follows character type sequence rules to search for the cardinal points of a partition.It then infers morphemes from the partition using the morphemes in its dictionary. The system has a complete dictionary which includes a few special parts of speech morphemes (particles and auxiliary-verb) in the initial stage. As the result of this morpheme attributes inference process, morphemes are then selected. Based upon these concepts, we developed a Japanese morpheme information acquisition system. Our experiments were conducted on a large corpus of 240, 000 morphemes. The text was composed of ASAHI newspaper editorials over a six-month period. We obtained an morpheme's accuracy inference rate of 90.5% for inflections and 95.2% for other parts of speech. The overall average morpheme's accuracy inference rate was 94.6%. There were 15, 523 unique headwords automatically obtained from a total of 228, 450 inferred morphemes.
- 言語処理学会の論文
言語処理学会 | 論文
- 複合語の分野連想語の効率的決定法
- クラス指向事例収集手法による言い換えコーパスの構築
- 動詞項構造辞書への大規模用例付与
- 言い換え技術に関する研究動向
- Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation