Creating a Noisy Parallel Corpus from Newswire Articles Using Cross-Language Information Retrieval (特集:マルチメディア通信と分散処理)
スポンサーリンク
概要
- 論文の詳細を見る
In this paper we present an adaptation of cross-language information retrieval for the production of an aligned bilingual corpus from niosy-parallel English-Japanese newswire articles. We implement the standard vector space model and show though simulation the effectiveness of five variations for the alignment task. The methods are computationally efficient, easy to evaluate, and generalizable to other genres and language pairs-an important factor if we are to use the aligned articles for knowledge acquisition in unrestricted domains. Our results show that alignment precision levels of over 70% at 70% recall are possible.
- 一般社団法人情報処理学会の論文
- 1999-01-15
著者
-
Collier N
Communication And Information System's Laboratory Research And Development Center Toshiba Corpo
-
Nigel Collier
Communication And Information Systems Research Laboratories Research And Development Center Toshiba
-
Collier Nigel
東京大学
-
Collier N
Associate Professor National Institute Of Informatics
-
Hirakawa Hideki
Communication and Information Systems Research Laboratories Research and Development Center, Toshiba
-
HIRAKAWA Hideki
Genetic Regulation, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University
-
COLLIER NIGEL
Communication and Information System's Laboratory, Research and Development Center, Toshiba Corporat
-
KUMANO AKIRA
Communication and Information System's Laboratory, Research and Development Center, Toshiba Corporat
-
Kumano A
Communication And Information System's Laboratory Research And Development Center Toshiba Corpo
-
Hirakawa H
Genetic Regulation Graduate School Of Bioresource And Bioenvironmental Sciences Kyushu University
-
Kumano Akira
Communication and Information System's Laboratory, Research and Development Center, Toshiba Corporation
関連論文
- 4P-5 医学・生物学文献からのタグ付きコーパスの作成
- 医学・生物学論文からのタグ付きコーパスの作成
- Cross-Language Information Access : a case study for English and Japanese
- Cross-Language Information Access :a case study for English and Japanese
- Mutational Analysis of Amino Acid Residues Involved in Catalytic Activity of a Family 18 Chitinase from Tulip Bulbs(Biochemistry & Molecular Biology)
- Comparison of Innovation Policy and Transfer of Technology from Public Institutions in Japan, France, Germany and the United Kingdom : Part 1 French and Japanese Case study
- Creating a Noisy Parallel Corpus from Newswire Articles Using Cross-Language Information Retrieval (特集:マルチメディア通信と分散処理)
- English-Japanese news article alignment from the Internet using MT