Patterns for Simplifying Complex Sentences in English-Japanese Machine Translation
スポンサーリンク
概要
- 論文の詳細を見る
This study explores in detail the question of complex sentences in English-to-Japanese Statistical Machine Translation (SMT). Complex sentences pose the most difficulty to existing SMT systems, due to large word order differences between SVO and SOV languages. In order to overcome this problem, we take a "divide and rewrite" approach: a complex sentence is divided into simple clauses based on syntactic patterns; then simple clauses are translated and the results are pieced together to form the final output. The main challenge of such "divide and rewrite" preprocessing approaches is the construction of syntactic patterns. While previous works focus on either automatic or manual methods, we pursue a semi-automatic approach. First, we automatically extract and cluster patterns of dependent clauses based on source-side parses. Our novel definition of pattern templates enables us to reduce all sources of syntactic variations into a small set of 100 clusters. Then it becomes feasible to perform a manual construction of corresponding target-side patterns. In our experiments, we demonstrate that this cost-effective approach covers 82 percent of all complex sentences and improves BLEU by over 2 points over the baseline.
- 2012-11-15
著者
-
Kevin Duh
NTTコミュニケーション科学基礎研究所
-
Yuji Matsumoto
Nara Institute Of Science And Technology
-
Chinh To
Nara Institute Of Science And Technology
-
Mamoru Komachi
Nara Institute Of Science And Technology
-
Kevin Duh
Nara Institute of Science and Technology
-
Shuhei Kondo
Nara Institute of Science and Technology
関連論文
- 単語並び換えモデルを考慮した統計的階層句機械翻訳システム
- 単語並び換えモデルを考慮した統計的階層句機械翻訳システム
- POS Tagging using Dependency Information
- Keypads for Large Letter-Set Languages and Small Touch-Screen Devices
- Keypads for Large Letter-Set Languages and Small Touch-Screen Devices
- A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge
- A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus Without any Language-Specific Knowledge
- Patterns for Simplifying Complex Sentences in English-Japanese Machine Translation
- Patterns for Simplifying Complex Sentences in English-Japanese Machine Translation
- Collocation Suggestion for Japanese Second Language Learners
- Collocation Suggestion for Japanese Second Language Learners