Bilingual Cluster Based Models for Statistical Machine Translation
スポンサーリンク
概要
- 論文の詳細を見る
We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.
- (社)電子情報通信学会の論文
- 2008-03-01
著者
-
YAMAMOTO Hirofumi
National Institute of Information and Communications Technology
-
Sumita Eiichiro
National Inst. Communications Technol. Kyoto‐fu Jpn
-
Yamamoto Hirofumi
Atr Spoken Language Translation Res. Lab. Kyoto‐fu Jpn
-
Sumita Eiichiro
National Institute Of Communications Technology
-
Yamamoto Hirofumi
National Inst. Information And Communications Technol. Kyoto‐fu Jpn
関連論文
- Constraining a Generative Word Alignment Model with Discriminative Output
- A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation
- Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity(Natural Language Processing)
- E_019 Achilles : A Chinese Morphological Analyzer
- Imposing Constraints from the Source Tree on ITG Constraints for SMT
- Introducing a Translation Dictionary into Phrase-Based SMT
- Training Set Selection for Building Compact and Efficient Language Models
- Constraining a Generative Word Alignment Model with Discriminative Output
- Bilingual Cluster Based Models for Statistical Machine Translation
- Statistical Language Model Adaptation with Additional Text Generated by Machine Translation
- Paraphrase Lattice for Statistical Machine Translation
- An Empirical Comparison of Parsers in Constraining Reordering for E-J Patent Machine Translation
- Japanese Argument Reordering Based on Dependency Structure for Statistical Machine Translation
- An Empirical Comparison of Parsers in Constraining Reordering for E-J Patent Machine Translation
- Database of Human Evaluations of Machine Translation Systems for Patent Translation
- How to Translate Dialects: A Segmentation-Centric Pivot Translation Approach
- Joint Phrase Alignment and Extraction for Statistical Machine Translation
- Joint Phrase Alignment and Extraction for Statistical Machine Translation
- Database of Human Evaluations of Machine Translation Systems for Patent Translation