Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web(Natural Language Processing)
スポンサーリンク
概要
- 論文の詳細を見る
A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCL's Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.
- 社団法人電子情報通信学会の論文
- 2006-07-01
著者
-
Sornlertlamvanich Virach
Nict Asia Research Center
-
Sornlertlamvanich Virach
Thai Computational Linguistics Laboratory Nict Asia Research Center
-
CHAROENPORN Thatsanee
Sirindhorn International Institute of Technology, Thammasat University
-
KRUENGKRAI Canasai
NICT Asia Research Center
-
THEERAMUNKONG Thanaruk
Sirindhorn International Institute of Technology, Thammasat University
-
KRUENGKRAI Canasai
Thai Computational Linguistics Lab. NICT Asia Research Center
-
Kruengkrai Canasai
National Inst. Information And Communications Technol. Kyoto‐fu Jpn
-
Charoenporn Thatsanee
Sirindhorn International Institute Of Technology Thammasat University
-
Theeramunkong Thanaruk
Thammasat Univ. Tha
-
Theeramunkong Thanaruk
Sirindhorn International Institute Of Technology Thammasat University
関連論文
- An EM-Based Approach for Mining Word Senses from Corpora(Natural Language Processing)
- Statistical-Based Approach to Non-segmented Language Processing(Knowledge, Information and Creativity Support System)
- Improving Search Performance : A Lesson Learned from Evaluating Search Engines Using Thai Queries(Knowledge, Information and Creativity Support System)
- Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web(Natural Language Processing)
- Extracting Chemical Reactions from Thai Text for Semantics-Based Information Retrieval
- Extracting Semantic Frames from Thai Medical-Symptom Unstructured Text with Unknown Target-Phrase Boundaries
- Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules(Databases)