Detecting New Words from Chinese Text Using Latent Semi-CRF Models
スポンサーリンク
概要
- 論文の詳細を見る
Chinese new words and their part-of-speech (POS) are particularly problematic in Chinese natural language processing. With the fast development of internet and information technology, it is impossible to get a complete system dictionary for Chinese natural language processing, as new words out of the basic system dictionary are always being created. A latent semi-CRF model, which combines the strengths of LDCRF (Latent-Dynamic Conditional Random Field) and semi-CRF, is proposed to detect the new words together with their POS synchronously regardless of the types of the new words from the Chinese text without being pre-segmented. Unlike the original semi-CRF, the LDCRF is applied to generate the candidate entities for training and testing the latent semi-CRF, which accelerates the training speed and decreases the computation cost. The complexity of the latent semi-CRF could be further adjusted by tuning the number of hidden variables in LDCRF and the number of the candidate entities from the Nbest outputs of the LDCRF. A new-words-generating framework is proposed for model training and testing, under which the definitions and distributions of the new words conform to the ones existing in real text. Specific features called “Global Fragment Information” for new word detection and POS tagging are adopted in the model training and testing. The experimental results show that the proposed method is capable of detecting even low frequency new words together with their POS tags. The proposed model is found to be performing competitively with the state-of-the-art models presented.
- (社)電子情報通信学会の論文
- 2010-06-01
著者
-
SUN Xiao
School of Computer Science & Technology, Dalian Nationality University
-
HUANG Degen
Department of Computer Science & Engineering, Dalian University of Technology
-
REN Fuji
Department of Information Science & Intelligent Systems, Tokushima University
-
Sun Xiao
School Of Computer Science & Technology Dalian Nationality University
-
Huang Degen
Department Of Computer Science & Engineering Dalian University Of Technology
-
Ren Fuji
Department Of Information Science & Intelligent Systems Tokushima University
-
Sun Xiao
School Of Agriculture And Biology And Sjtu Research Center For Low Carbon Agriculture Shanghai Jiao Tong University
関連論文
- Detecting New Words from Chinese Text Using Latent Semi-CRF Models
- A Boltzmann Machine with Non-rejective Move(Special Section on Papers Selected from ITC-CSCC 2001)
- Detecting New Words from Chinese Text Using Latent Semi-CRF Models
- A New Question Answering System for Chinese Restricted Domain(Language,Human Communication II)
- Undoped White Organic Light-Emitting Diodes Utilizing Two Sources of Excitons
- Stoichiometric traits of oriental oak (Quercus variabilis) acorns and their variations in relation to environmental variables across temperate to subtropical China
- Ultraviolet Lasing Phenomenon of Zinc Oxide Hexagonal Microtubes
- Electrically Tunable Three-Dimensional Holographic Photonic Crystal Made of Polymer-Dispersed Liquid Crystals Using a Single Prism