Improving N-gram Distribution for Sampling-based Alignment by Extraction of Longer N-grams
スポンサーリンク
概要
- 論文の詳細を見る
Translation tables are an essential component of statistical machine translation system. The sampling-based alignment method is a way of building translation tables. It has advantages in speed and accuracy, but lays slightly behind in translation evaluations compared to the standard alignment technique. Previous research has proved that the sampling-based alignment method does not generate enough long N-gram alignments. This harms in translation evaluation. This paper investigates translation table obtained by the sampling-based alignment method in detail and introduces an improved distribution to allot time for different N-gram lengths. The new model helps in outputting more numerous longer N-grams. We report significant improvements in BLEU scores in 110 Europarl corpus language pairs.
- 2014-01-30
著者
-
Yves Lepage
Graduate School Of Information Production And Systems Waseda University
-
Juan Luo
Graduate School of IPS, Waseda University
-
Suchen Zhang
Graduate School of IPS, Waseda University
-
Yves Lepage
Graduate School of IPS, Waseda University
関連論文
- Evaluation of Analogy-based Translation of Chunks obtained by Marker-based Chunking
- Improving N-gram Distribution for Sampling-based Alignment by Extraction of Longer N-grams