Web Page Classification using Anchor-related Text Extracted by a DOM-based Method
スポンサーリンク
概要
- 論文の詳細を見る
Directory services are popular among people who search their favorite information on the Web. Those services provide hierarchical categories for finding a users favorite page. Pages on the Web are categorized into one of the categories by hand. Many existing studies classify a web page by using text in the page. Recently, some studies use text not only from a target page which they want to categorize, but also from the original pages which link to the target page. We have to narrow down the text part in the original pages, because they include many text parts that are not related to the target page. However these studies always use a unique extraction method for all pages. Although web pages usually differ so much in their formats, they do not change their extraction methods. We have already developed an extraction method of anchor-related text. We use text parts extracted by our method for classifying web pages. The results of the experiments showed that our extraction method improves the classification accuracy.
著者
-
Hung Bui
Graduate School Of Engineering Science Osaka University
-
HIJIKATA Yoshinori
Graduate School of Engineering Science, Osaka University
-
Nishida Shogo
Graduate School Of Engineering Science Osaka University
-
Otsubo Masanori
Graduate School Of Engineering Science Osaka University
-
Hijikata Yoshinori
Graduate School Of Engineering Science Osaka University
-
HUNG Bui
Graduate School of Engineering Science, Osaka University
関連論文
- Web Page Classification using Anchor-related Text Extracted by a DOM-based Method
- Extraction of Semantic Text Portion Related to Anchor Link(Language,Human Communication II)
- NTM-Agent : Text Mining Agent for Net Auction(Human Communication I)
- Special Section on Human Communication I
- Estimating Reviewer Credibility Using Review Contents and Review Histories