Improving Vietnamese Word Segmentation and POS Tagging using MEM with Various Kinds of Resources
スポンサーリンク
概要
- 論文の詳細を見る
Word segmentation and POS tagging are two important problems included in many NLP tasks. They, however, have not drawn much attention of Vietnamese researchers all over the world. In this paper, we focus on the integration of advantages from several resourses to improve the accuracy of Vietnamese word segmentation as well as POS tagging task. For word segmentation, we propose a solution in which we try to utilize multiple knowledge resources including dictionary-based model, N-gram model, and named entity recognition model and then integrate them into a Maximum Entropy model. The result of experiments on a public corpus has shown its effectiveness in comparison with the best current models. We got 95.30% F1 measure. For POS tagging, motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme-based features. Our experiments based on two POS-tagged corpora showed that morpheme-based features always give promising results. In the best case, we got 89.64% precision on a Vietnamese POS-tagged corpus when using Maximum Entropy model.
著者
-
Ha Thuy
College Of Technology Vietnam National University Hanoi
-
TRAN Oanh
College of Technology, Vietnam National University Hanoi
-
Le Cuong
College Of Technology Vietnam National University Hanoi
-
Tran Oanh
College Of Technology Vietnam National University Hanoi
関連論文
- Improving Vietnamese Word Segmentation and POS Tagging using MEM with Various Kinds of Resources
- Improving Vietnamese Word Segmentation and POS Tagging using MEM with Various Kinds of Resources