Machine Learning Approach to Multi-Document Summarization.

概要

論文の詳細を見る
Due to the rapid growth of the Internet and the emergence of low-price and largecapacity storage devices, the number of online documents is exploding. Automatic summarization is the key handling this situation. The cost of manual work demands that we be able to summarize a document set related to a certain event. This paper proposes a method of extracting important sentences from document sets. The method is based on Support Vector Machines, a technology that is attracting attention in the field of natural language processing. We conducted experiments using three document sets formed from twelve events published in the MAINICHI newspaper of 1999. These sets were manually processed by newspaper editors. Tests using this corpus show that our method has better performance than either the Lead-based method or the TF-IDF method. Moreover, we clarify that reducing redundancy is not always effective for extracting important sentences from a set of multiple documents taken from a single source.
言語処理学会の論文