A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge
スポンサーリンク
概要
- 論文の詳細を見る
We propose a flexible and effective framework for extracting bilingual dictionaries from comparable corpora without using any language-specific knowledge such as seeds or additional dictionaries. Our approach is based on a novel combination of topic modeling and word alignment techniques in a pipeline style: first, our approach converts a comparable document-aligned corpus into a parallel topic-aligned corpus using topic modeling techniques, then learns translation relationships between words using word alignment models such as IBM model I. Compared with previous work, our framework is advantageous in that it only uses the statistical information without requiring any languagespecific knowledge for initialization. Furthermore, our framework is capable of handling polysemy: for example, it can extract distinct translations for the word "Apple" as a fruit or as a company. Experiments on a large-scale Wikipedia corpus, show that our framework reliably extracts high-precision word pairs on a wide variety of comparable data conditions.
- 2012-11-15
著者
-
Kevin Duh
NTTコミュニケーション科学基礎研究所
-
Yuji Matsumoto
Nara Institute Of Science And Technology
-
Kevin Duh
Nara Institute of Science and Technology
-
Xiaodong Liu
Nara Institute Of Science And Technology
関連論文
- 単語並び換えモデルを考慮した統計的階層句機械翻訳システム
- 単語並び換えモデルを考慮した統計的階層句機械翻訳システム
- POS Tagging using Dependency Information
- Keypads for Large Letter-Set Languages and Small Touch-Screen Devices
- Keypads for Large Letter-Set Languages and Small Touch-Screen Devices
- A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge
- A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge
- Patterns for Simplifying Complex Sentences in English-Japanese Machine Translation
- Patterns for Simplifying Complex Sentences in English-Japanese Machine Translation
- Collocation Suggestion for Japanese Second Language Learners
- Collocation Suggestion for Japanese Second Language Learners