Efficient Algorithms for Semi-Structured Data Mining(Learning & Discovery)(<Special Issue>Doctorial Theses on Aritifical Intelligence)
スポンサーリンク
概要
- 論文の詳細を見る
By rapid progress of network and storage technologies, a huge amount of weakly structured data such as Web pages and XML data, called semistructured data, have been available on Internet and intranets Therefore, there have been increasing demands for efficient methods that extract rules and patterns from semi-structured data, namely semi-structured data mining However, semi-structured data are a huge amount of complex and hetero-geneous data modeled by trees or graphs Thus, we can not directly apply to semi-structured data traditional data mining methods for relational databases Hence, it is an important challenge to develop efficient and scalable methods for semi-structured data mining In this thesis, we model semi-structured data as labeled trees, and study efficient semi-structured data mining algorithms for various classes of tree patterns In Chap 3, we consider the problem of discovering frequent ordered tree patterns from semi-structured data and developed an efficient algorithm FREQT for the problem The key technique is an efficlent enumeration technique called rightmost expansion, which enumerates all the ordered trees in 0 (1) time per pattern Consequently, FREQT computes all the frequent patterns m 0 (kb^2m) time per pattern without duplication, where k is the size of the pattern, b is the maximum branching of an input data tree, and m is the number of occurrences of the pattern In Chap 4, we then extend the algorithm FREQT to the optimized pattern discovery problem and give an efficient algorithin OPTT for mining optimized ordered tree patterns In Chap 5, we consider the frequent unordered tree pattern discovery problem for semi-structured data and developed an efficient algorithm UNOT for the problem In Chap 6, we study a variant of the frequent tree discovery problem, called frequent pattern mining from semi-structured data streams, where the input to the mining algorithm is not a static dataset, but a rapidly changing and unbounded data stream We developed an online algorithm StreamT, which discovers all the frequent ordered trees appearing in a given data stream by scanning it once Finally, we conclude this thesis and discuss future research directions in Chap 7
- 社団法人人工知能学会の論文
- 2005-01-01
著者
関連論文
- 情報爆発時代における新しいデータ処理方式の研究と知識発見システムへの応用(編集委員2007年の抱負)
- Efficient Algorithms for Semi-Structured Data Mining(Learning & Discovery)(Doctorial Theses on Aritifical Intelligence)
- 滑走窓や忘却の概念を用いたオンライン型半構造データマイニングアルゴリズム
- 滑走窓や忘却の概念を用いたオンライン型半構造データマイニングアルゴリズム
- 半構造データマイニングのための部分構造パターンの効率的探索
- 半構造データマイニングにおけるパターン発見技法
- 大規模木構造データからの頻出部分構造パターン発見アルゴリズム(文字列アルゴリズム)
- 半構造データからの効率よい無順序木パターン発見手法
- 大規模木構造データからの高速な部分構造発見(「21世紀の知識情報科学に向けて」,及び一般)
- データストリーム処理のための効率良いXPath問合せ機構(セッション4A : 時空間データ・ストリーム)
- データストリーム処理のための効率良いXPath問合せ機構(時空間データ・ストリーム)(「夏のデータベースワークショップ(DBWS2003)」一般)
- 3.大規模半構造データからの高速な知識発見システム : 効率良い木構造バターンの発見と照合(広がる列挙の技術-列挙による問題解決アプローチ-)