Using Constituent Boundary Parsing for Multi-lingual Spoken-language Translation

概要

論文の詳細を見る
We propose a method called constituent boundary parsing which uses pattern matching on the surface form. The new version of Transfer-Driven Machine Translation (TDMT) combining constituent boundary parsing and example-based processing is effective for multi-lingual spoken-language translation. Constituent boundary parsing consistently describes the syntactic structures of various expressions with surface patterns consisting of variables and constituent boundaries. In constituent boundary parsing, input words are read in a left-to-right fashion, and the best syntactic structure is efficiently built up based on a chart-parsing algorithm while disambiguating local structures. By introducing constituent boundary parsing, the problems of the earlier version of TDMT, such as the descriptive power of syntactic structures and the explosion of structural ambiguity are solved. Also, because constituent boundary parsing and example-based processing are simple and languageindependent, TDMT's applicability to multi-lingual spoken-language translation has been enhanced. We have evaluated the TDMT system which translates bilingually between Japanese and English, and Japanese and Korean in the domain of travel conversations. Experimental results show that a wide range of sentences in the domain can be translated into understandable output in real-time by the proposed TDMT.
言語処理学会の論文