Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora(Natural Language Processing)
スポンサーリンク
概要
- 論文の詳細を見る
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance ; namely, the Fmeasure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
- 社団法人電子情報通信学会の論文
- 2005-02-01
著者
-
Morimoto Yasutsugu
Central Research Laboratory Hitachi Ltd.
-
KAJI Hiroyuki
Central Research Laboratory, Hitachi, Ltd.
-
Kaji Hiroyuki
Central Research Institute Itoham Food Inc.
-
Kaji Hiroyuki
Central Research Laboratory Hitachi Ltd.
関連論文
- Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora(Natural Language Processing)
- Primary Structure, Expression, and Site-Directed Mutagenesis of Inorganic Pyrophosphatase from bacillus stearothermophilus
- Molecular Cloning,Expression,and Site-Directed Mutagenesis of Inorganic Pyrophosphatase from Thermus thermophilus HB8
- Adapting a Bilingual Dictionary to Domains(Natural Language Processing)
- Significance of the Highly Conserved Gly-4 Residue in Human Cystatin A^1
- Molecular Cloning, Enhancement of Expression Efficiency and Site-Directed Mutagenesis of Rat Epidermal Cystatin A
- A New UV Method for Serum γ-Glutamyltransferase Assay Using Recombinant 4-Aminobenzoate Hydroxylase as a Coupling Enzyme
- Overexpression in Escherichia coli of Chemically Synthesized Gene for Active 0.19 α-Amylase Inhibitor from Wheat Kernel
- Characterization of Copper Atoms in Bilirubin Oxidase by Spectroscopie Analyses
- Extracting Translation Equivalents from Bilingual Comparable Corpora(Natural Language Processing)