Wikipedia version tree reconstruction by clustering revisions through keywords
スポンサーリンク
概要
- 論文の詳細を見る
As the widespread diffusion of user generated contents, documents having past versions are rapidly growing, especially among the field of wiki contents and office documents. Take Wikipedia for example, it has been the world's largest collaboratively edited source of encyclopedic knowledge. Anybody can edit an article using a wiki markup language that offers a simplified alternative to HTML. For each article, Wikipedia provides a method to export an XML file of an edit history having timestamps, which is essential to evaluate trustworthiness and provenance of the article. The problem is that even though there is an edit history, it is still hard to know how an article has evolved. A tree structure is embedded in the linear structure of the timestamps. To overcome this problem, we propose a version tree reconstruction method by clustering versions through keywords. A version tree can explain how a document has evolved through collaborative editing as well as illuminate dependencies among documents. In this paper, we will show experimental evaluation on a number of edit histories from Wikipedia to validate how our proposed method works.
- 2011-07-26