One Click One Revisited: Enhancing Evaluation based on Information Units
スポンサーリンク
概要
- 論文の詳細を見る
This paper extends the evaluation framework of the NTCIR-9 One Click Access Task (1CLICK-1), which required systems to return a single, concise textual output in response to a query in order to satisfy the user immediately after a click on the SEARCH button. Unlike traditional nugget-based summarisation and question answering evaluation methods, S-measure, the official evaluation measure of 1CLICK-1, discounts the value of each information unit based on its position within the textual output. We first show that the discount parameter L of S-measure affects system ranking and discriminative power, and that using multiple values, e.g. L = 250 (user has only 30 seconds to view the text) and L = 500 (user has one minute), is beneficial. We then complement the recall-like S-measure with a simple, precision-like measure called T-measure as well as a combination of S-measure and T-measure, called S#. We show that S# with a heavy emphasis on S-measure imposes an appropriate length penalty to 1CLICK-1 system outputs and yet achieves discriminative power that is comparable to S-measure. These new measures will be used at NTCIR-10 1CLICK-2.
- 2012-07-25
著者
関連論文
- Japanese Hyponymy Extraction based on a Term Similarity Graph
- Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
- The Reusability of a Diversified Search Test Collection
- One Click One Revisited: Enhancing Evaluation based on Information Units
- The Reusability of a Diversified Search Test Collection
- One Click One Revisited: Enhancing Evaluation based on Information Units
- Web Search Evaluation with Informational and Navigational Intents (Preprint)
- A Preview of the NTCIR-10 INTENT-2 Results
- A Preview of the NTCIR-10 INTENT-2 Results
- How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures
- How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures