A Consideration on the Methodology for Evaluating Large-scale Paraphrase Lexicons
スポンサーリンク
概要
- 論文の詳細を見る
Aiming at creating paraphrase lexicons that ensure good coverage of the target classes of paraphrases along with a low proportion of incorrect information, in the last decade, researchers have proposed methods for extracting sub-sentential paraphrases from various types of corpora. Once a paraphrase lexicon is created, then the ensuing issue is how to measure its quality. This is typically performed through a substitution test: each of sampled pairs of expressions is judged whether it is a correct paraphrase pair or not by evaluating grammaticality and meaning equivalence of the expressions in actual sentences. In this paper, we describe the issues in evaluating paraphrase lexicons. Then, focusing on a widely-used evaluation scheme, i.e., substitution test for samples, we propose three extensions designed for obtaining a more consistent human judgments: (i) classification-based evaluation criteria, (ii) two-step unit-wise evaluation procedure, and (iii) re-evaluation of disagreed examples. Through an evaluation experiment, we have confirmed at least the third extension contributes to improve the inter-evaluator agreement ratio.
- 2013-11-07