Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences
スポンサーリンク
概要
- 論文の詳細を見る
This paper is devoted to considering mining infrequent patterns from biological sequences. As such a mining algorithm, FPCS (Finding Peculiar Composite Strings) was proposed, where two substrings x and y are decided by given data and their concatenation xy is evaluated in a model-driven manner. Although its effectiveness has already shown, it requires the background set of sequences, in addition to the target set. In this paper, we propose another approach for infrequent patterns, which, given a single set of sequences, finds string patterns of two substrings frequent in the set. Therefore, the proposed approach is simpler than FPCS. Using biological features, such as RNA, of popular bacterial DNA sequences, the effectiveness of the proposed approach is evaluated. For B. subtilis and C. perfringens, the proposed approach can find RNA regions as well as FPCS while it fails to do that for E. coli and S. enterica because FPCS is more finely granular than the proposed approach.
- 2013-07-15