F-046 Biological Data Analysis based on Kolmogorov Complexity
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we focus on one simple data mining method called Normalized Compression Distance (NCD) which has been suggested by Cilibrasi Vitanyi. By this method, we analyzed the HA sequences of virus data for the HA gene based on the available compressors. The built-in compressors zlib and bzip are compared by using the Complearn Toolkit. And a comparison is made with respect to hierarchical and spectral clustering. Our results shows that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip compressor and, and the hierarchical clustering is a bit better than spectral clustering if all data are concerned.
- FIT(電子情報通信学会・情報処理学会)推進委員会の論文
- 2009-08-20
著者
関連論文
- 相対包摂のもとでの最小汎化が存在するための条件
- 底汎化法を用いた帰納推論システムの実装方式
- F-035 Detecting Natural Similarities in Scientific Documents : Author versus Content
- F-046 Biological Data Analysis based on Kolmogorov Complexity
- A-022 Relational Properties Expressible with One Universal Quantifier are Testable (Extended Abstract)
- A-015 Implementation and Evaluation of a Significant Fourier Transform Algorithm