An Algorithm for Finding Frequently Appearing Long String Patterns from Large Scale Databases
スポンサーリンク
概要
- 論文の詳細を見る
We propose a new algorithm for solving frequent string mining problem with allowing approximate matches. The algorithm first computes the similarity between the strings in the database, and enumerate clusters generated by similarity. We then compute representative strings for each cluster, and the representatives are our frequent strings. Further, by taking majority votes, we extend the obtained representatives to obtain long frequent strings. The computational experiments we performed show the efficiency of both our model and algorithm; we were able to find many string patterns appearing many times in the data, and that were long but not particularly numerous. The computation time of our method is practically short, such as 20 minutes even for a genomic sequence of 100 millions of letters.
- 2013-10-30
著者
-
Takeaki Uno
National Institute Of Informatics
-
Tsuyoshi Koide
National Institute of Genetics
-
Juzoh Umemori
Fujita Health University
関連論文
- The Complexity of Free Flood Filling Games
- Hardness Results and an Exact Exponential Algorithm for the Spanning Tree Congestion Problem
- On the number of reduced trees, cographs, and series-parallel graphs by compression
- Algorithm to Generate All Connected Simple Graphs of Given Order
- An Algorithm for Finding Frequently Appearing Long String Patterns from Large Scale Databases
- Toward Constant Time Enumeration