Building a Thai part-of-speech tagged corpus (ORCHID).
スポンサーリンク
概要
- 論文の詳細を見る
ORCHID (Open linguistic Resources CHanelled toward InterDisciplinary research) is an initiative project aimed at building linguistic resources to support research in, but not limited to, natural language processing. Based on the concept of an open architecture design, the resources must be fully compatible with similar resources, and software tools must also be made available. This paper presents one result of the project, the construction of a Thai part-of-speech (POS) tagged corpus, which is a preliminary stage in the construction of a Thai speech corpus. The POS-tagged corpus is the result of collaborative research between the Communications Research Laboratory (CRL) in Japan and the National Electronics and Computer Technology Center (NECTEC) in Thailand, with technical support from the Electrotechnical Laboratory (ETL) in Japan. In this paper, we propose a new tagset, based on the results of a prior multilingual machine translation project. The corpus is annotated on three levels: the paragraph, sentence, and word levels. Text information is maintained in the form of the <I>text information lines</I> and the number lines, which are both utilized in data retrieval. Both word segmentation and POS tagging were carried out by way of a probabilistic trigram model. Rules for syllable demarkation were additionally used to reduce the number of candidates in computing tagging probabilities
- 一般社団法人 日本音響学会の論文
一般社団法人 日本音響学会 | 論文
- How large is the individual difference in hearing sensitivity?: Establishment of ISO 28961 on the statistical distribution of hearing thresholds of otologically normal young persons
- Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis
- Vocal cord vibration in the production of consonants. Observation by means of high-speed digital imaging using a fiberscope.:Observation by means of high-speed digital imaging using a fiberscope
- The early reflections of the impulse response in an auditorium.
- Multiple reflections between rigid plane panels.