Joint Phrase Alignment and Extraction for Statistical Machine Translation
スポンサーリンク
概要
- 論文の詳細を見る
The phrase table, a scored list of bilingual phrases, lies at the center of phrase-based machine translation systems. We present a method to directly learn this phrase table from a parallel corpus of sentences that are not aligned at the word level. The key contribution of this work is that while previous methods have generally only modeled phrases at one level of granularity, in the proposed method phrases of many granularities are included directly in the model. This allows for the direct learning of a phrase table that achieves competitive accuracy without the complicated multi-step process of word alignment and phrase extraction that is used in previous research. The model is achieved through the use of non-parametric Bayesian methods and inversion transduction grammars (ITGs), a variety of synchronous context-free grammars (SCFGs). Experiments on several language pairs demonstrate that the proposed model matches the accuracy of the more traditional two-step word alignment/phrase extraction approach while reducing its phrase table to a fraction of its original size.
- 2012-03-15
著者
-
Tatsuya Kawahara
Kyoto University
-
Graham Neubig
Graduate School Of Informatics Kyoto University|national Institute Of Information And Communications
-
Taro Watanabe
National Institute of Information and Communications Technology
-
Eiichiro Sumita
National Institute of Information and Communications Technology
-
Shinsuke Mori
Graduate School of Informatics, Kyoto University
-
Tatsuya Kawahara
Graduate School of Informatics, Kyoto University
-
Tatsuya Kawahara
Graduate School Of Informatics Kyoto University
-
Shinsuke Mori
Graduate School Of Informatics Kyoto University
関連論文
- Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data
- Joint Phrase Alignment and Extraction for Statistical Machine Translation
- Comparison of Discriminative Models for Lexicon Optimization for ASR of Agglutinative Language
- An Empirical Comparison of Parsers in Constraining Reordering for E-J Patent Machine Translation (preprint)
- Partial and Synchronized Caption Generation to Enhance the Listening Comprehension Skills of Second Language Learners
- Partial and Synchronized Caption Generation to Enhance the Listening Comprehension Skills of Second Language Learners
- Classifier-based Data Selection for Lightly-Supervised Training of Acoustic Model for Lecture Transcription