Acquisition Method of Unknown Word's Morpheme Dictionary Information Using Word's Juxtapositional Relationships

概要

論文の詳細を見る
This paper describes an inference method for acquiring morpheme information of unknown word from a large corpus. The method is comprised of three functions: inferring morpheme's part-of-speech, conjugation type, and conjugation (we call these morpheme attributes in this paper), updating inferred morpheme attributes by probability factors derived from a large corpus, and inferring Japanese language morphemes. The conjunctive relationships between words in a sentence are utilized to infer the morpheme attributes of unknown word. Since a Japanese sentence is a sequence of characters without any blank spaces to mark word boundaries, our system had to be able to identify word boundaries. To do this, it first follows character type sequence rules to search for the cardinal points of a partition.It then infers morphemes from the partition using the morphemes in its dictionary. The system has a complete dictionary which includes a few special parts of speech morphemes (particles and auxiliary-verb) in the initial stage. As the result of this morpheme attributes inference process, morphemes are then selected. Based upon these concepts, we developed a Japanese morpheme information acquisition system. Our experiments were conducted on a large corpus of 240, 000 morphemes. The text was composed of ASAHI newspaper editorials over a six-month period. We obtained an morpheme's accuracy inference rate of 90.5% for inflections and 95.2% for other parts of speech. The overall average morpheme's accuracy inference rate was 94.6%. There were 15, 523 unique headwords automatically obtained from a total of 228, 450 inferred morphemes.
言語処理学会の論文

Acquisition Method of Unknown Word's Morpheme Dictionary Information Using Word's Juxtapositional Relationships

スポンサーリンク

概要

言語処理学会 | 論文

スポンサーリンク