Learning Korean Named Entity by Bootstrapping with Web Resources(Natural Language Processing)
スポンサーリンク
概要
- 論文の詳細を見る
An important issue in applying machine learning algorithms to Natural Language Processing areas such as Named Entity Recognition tasks is to overcome the lack of tagged corpora. Several bootstrapping methods such as co-training have been proposed as a solution. In this paper, we present a different approach using the Web resources. A Named Entity (NE) tagged corpus is generated from the Web using about 3,000 names as seeds. The generated corpus may have a lower quality than the manually tagged corpus but its size can be increased sufficiently. Several features are developed and the decision list is learned using the generated corpus. Our method is verified by comparing it to both the decision list learned on the manual corpus and the DL-CoTrain method. We also present a two-level classification by cascading highly precise lexical patterns and the decision list to improve the performance.
- 社団法人電子情報通信学会の論文
- 2004-12-01
著者
-
Kwak Byung-kwan
Nlp Lab. Postech
-
Lee Seungwoo
Nlp Lab.
-
Lee Seungwoo
Nlp Lab. Postech
-
Lee Gary
Nlp Lab.
-
Lee Gary
Nlp Lab. Postech
-
AN Joohui
NLP lab., POSTECH
-
An Joohui
Nlp Lab. Postech
関連論文
- Use of Dynamic Passage Selection and Lexico-Semantic Patterns for Japanese Natural Language Question Answering(Special Issue on Text Processing for Information Access)
- Learning Korean Named Entity by Bootstrapping with Web Resources(Natural Language Processing)