F-052 Clustering the Normalized Compression Distance for Biological Data
スポンサーリンク
概要
- 論文の詳細を見る
Our recently results supporting the usefulness of the normalized compression distance for the task to classify genome sequences of virus data are reported in this paper. Specifically, the problem to cluster the hemagglutinin(HA) sequences of influenza virus data for the HA gene in dependence virus genome data with respect to their four serotypes are studied. A comparison is made with respect to hierarchical and spectral clustering via the kLine algorithm by Fischer and Poland(2004), respectively, and with respect to the standard compressors bzlip, ppmd and zlib. Our results are very promising and show that one can obtain an (almost) perfect clustering for all the problems studies.
- FIT(電子情報通信学会・情報処理学会)推進委員会の論文
- 2010-08-20
著者
-
Ito Kimihito
Research Center For Zoonosis Control Hokkaido University
-
Zhu Yu
Division Of Computer Science Graduate School Of Information Science And Technology Hokkaido Universi
-
Zeugmann Thomas
Division of Computer Science,Graduate School of Information Science and Technology, Hokkaido Univers
-
Zeugmann Thomas
Division Of Computer Science Graduate School Of Information Science And Technology Hokkaido Universi
関連論文
- F-052 Clustering the Normalized Compression Distance for Biological Data
- Symmetric item set mining method using zero-suppressed BDDs and application to biological data (論文特集:データマイニングと統計数理)