Integration of Multiple Bilingually-Trained Segmentation Schemes into Statistical Machine Translation
スポンサーリンク
概要
- 論文の詳細を見る
This paper proposes an unsupervised word segmentation algorithm that identifies word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches. The method can be applied to any language pair in which the source language is unsegmented and the target language segmentation is known. In the first step, an iterative bootstrap method is applied to learn multiple segmentation schemes that are consistent with the phrasal segmentations of an SMT system trained on the resegmented bitext. In the second step, multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating five Asian languages into English revealed that the proposed method of integrating multiple segmentation schemes outperforms SMT models trained on any of the learned word segmentations and performs comparably to available monolingually built segmentation tools.
論文 | ランダム
- リヴァプールの異人-『嵐が丘』を読む(1)-
- 85) 高齢者巨大左房粘液腫の一例
- 126)三尖弁, 僧帽弁, 大動脈弁に疣贅を認めた感染症心内膜炎の1例
- 51)正常冠動脈を呈し低HDL血症の1例
- 144)^Tc-tetrofosmin心筋シンチにおける心筋viability評価能の検討 : 負荷再静注^Tl及び安静時^Tl心筋シンチとの0/0uptakeでの比較