:An Approach to The Unknown Word Problem
スポンサーリンク
概要
- 論文の詳細を見る
Morphological analysis is one of the basic techniques used in Japanese sentence analysis. A morpheme is defined as the minimal grammatical unit such as a word or a suffix Morphological analysis is the process segmenting a given sentence into a row of morphemes and assigning to each morpheme grammatical attributes such as a part-of-speech (POS) and an inflection type. Recently, one of the most important issues in morphological analysis has become how to deal with unknown words, or words which are not found in a dictionary or a training corpus. So far, there have been mainly two statistical approaches for coping with this issue. One is the method of acquiring unknown words from corpora and incorporating them into a dictionary. The other is the method of estimating a model which can recognize unknown words correctly. We would like to be able to make good use of both approaches. If words acquired by the former method could be added to a dictionary and a model developed by the latter method could consult the amended dictionary, then the model could be the best statistical model which has the potential to overcome the unknown word problem. In this paper, we propose a method for Japanese morphological analysis based on a maximum entropy (M. E.) model. This method uses a model which can not only consult a dictionary with a large amount of lexical information but also recognizes unknown words by learning certain characteristics. We focused on the information such as what types of characters are used in a string in order to learn these characteristics. The model has the potential to overcome the unknown word problem. The recall and precision of the identification of a morpheme segment and its major parts-of-speech were 95.80% and 95.09%, respectively, when using the Kyoto University corpus.
- 言語処理学会の論文
言語処理学会 | 論文
- 複合語の分野連想語の効率的決定法
- クラス指向事例収集手法による言い換えコーパスの構築
- 動詞項構造辞書への大規模用例付与
- 言い換え技術に関する研究動向
- Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation