A Bayesian Model of Transliteration and Its Human Evaluation When Integrated into a Machine Translation System
スポンサーリンク
概要
- 論文の詳細を見る
The contribution of this paper is two-fold. Firstly, we conduct a large-scale real-world evaluation of the effectiveness of integrating an automatic transliteration system with a machine translation system. A human evaluation is usually preferable to an automatic evaluation, and in the case of this evaluation especially so, since the common machine translation evaluation methods are affected by the length of the translations they are evaluating, often being biassed towards translations in terms of their length rather than the information they convey. We evaluate our transliteration system on data collected in field experiments conducted all over Japan. Our results conclusively show that using a transliteration system can improve machine translation quality when translating unknown words. Our second contribution is to propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the overfitting problem inherent in maximum likelihood training. We demonstrate the effectiveness of our Bayesian segmentation by using it to build a translation model for a phrase-based statistical machine translation (SMT) system trained to perform transliteration by monotonic transduction from character sequence to character sequence. The Bayesian segmentation was used to construct a phrase-table and we compared the quality of this phrase-table to one generated in the usual manner by the state-of-the-art GIZA++ word alignment process used in combination with phrase extraction heuristics from the MOSES statistical machine translation system, by using both to perform transliteration generation within an identical framework. In our experiments on English-Japanese data from the NEWS2010 transliteration generation shared task, we used our technique to bilingually co-segment the training corpus. We then derived a phrase-table from the segmentation from the sample at the final iteration of the training procedure, and the resulting phrase-table was used to directly substitute for the phrase-table extracted by using GIZA++/MOSES. The phrase-table resulting from our Bayesian segmentation model was approximately 30% smaller than that produced by the SMT systems training procedure, and gave an increase in transliteration quality measured in terms of both word accuracy and F-score.
論文 | ランダム
- 6 沈殿分離 : 6・1沈殿による分離
- 8 熱分析
- 2種類の直線せん断型試験機による正規圧密粘土と鍋との間の摩擦挙動(『土質工学会論文報告集』Vol.33,No.2 (1993年6月発行)掲載論文の概要)
- FRICTIONAL BEHAVIOUR BETWEEN NORMALLY CONSOLIDATED CLAY AND STEEL BY TWO DIRECT SHEAR TYPE APPARATUSES
- 限界すべり面の急速決定 : 計算手法の構造, Jay S.DeNatale : Rapid Identification of Critical Slip Surfaces : Structure [ASCE, Journal of Geotechnical Engineering, Vol.117, No.10, "9110", pp.1568-1589, 図14, 表1](構造)(文献抄録)