XML Documents Searching Combining Structure and Keywords Similarities
スポンサーリンク
概要
- 論文の詳細を見る
In recent years, XML has been increasingly become an emerging standard and widely used in many appli-cations. For example, office documents which are more and more popular used at this time, are also stored in multiple parts of XML archive formats. It is known that the structure and content of XML files play different roles depending on kind of documents. Therefore, achievement similarity search of an XML file should base on both structure and content. In previous work, LAX+ is an algorithm for reckoning a similarity value from structure and contents of XML files in the office documents. However, since LAX+ used exactly matching method between corresponding leaves, similar words in the leaf-nodes are considered as different. To solve the problem, we propose to combine LAX+ with keyword similarity in leaf-nodes. We use docx, xlsx and pptx file formats as experimental data set. The evaluation shows that our approach can be used to improve the precision and recall.
- 一般社団法人電子情報通信学会の論文
- 2013-07-15
著者
-
Yokota Haruo
Tokyo Inst. Of Technol.
-
WATANABE YOUSUKE
Tokyo Institute of Technology
-
AUVATTANASOMBAT APICHAYA
Tokyo Institute of Technology
関連論文
- Comparing Hadoop and Fat-Btree based access method for Small File I/O Applications
- Comparing Hadoop and Fat-Btree based access method for Small File I/O Applications
- 自律ディスクにおける任意サイズのデータの取り扱い
- An Evaluation on Power Consumption and Performance Balancing Distributed Storage Systems
- An Evaluation on Power Consumption and Performance Balancing Distributed Storage Systems
- XML Documents Searching Combining Structure and Keywords Similarities