A Novel Bioinformatic Strategy for Unveiling Hidden Genome Signatures of Eukaryotes: Self-Organizing Map of Oligonucleotide Frequency
スポンサーリンク
概要
- 論文の詳細を見る
With the increasing amount of available genome sequences, novel tools are needed for comprehensive analysis of species-specific sequence characteristics for a wide variety of genomes. We used an unsupervised neural network algorithm, Kohonen's self-organizing map (SOM), to analyze diand trinucleotide frequencies in 9 eukaryotic genomes of known sequences (a total of 1.2 Gb); <I>S. cerevisiae, S. pombe, C. elegans, A. thaliana, D. melanogaster, Fugu</I>, and rice, as well as <I>P. falciparum</I> chromosomes 2 and 3, and human chromosomes 14, 20, 21, and 22, that have been almost completely sequenced. Each genomic sequence with different window sizes was encoded as a 16-and 64-dimensional vector giving relative frequencies of di- and trinucleotides, respectively. From analysis of a total of 120, 000 nonoverlapping 10-kb sequences and overlapping 100-kb sequences with a moving step size of 10 kb, derived from a total of the 1.2 Gb genomic sequences, clear species-specific separations of most sequences were obtained with the SOMs. The unsupervised algorithm could recognize, in most of the 120, 000 10-kb sequences, the species-specific characteristics (key combinations of oligonucleotide frequencies) that are signature representations of each genome. Because the classification power is very high, the SOMs can provide fundamental bioinformatic strategies for extracting a wide range of genomic information that could not otherwise be obtained.
- 日本バイオインフォマティクス学会の論文
日本バイオインフォマティクス学会 | 論文
- Performance Improvement in Protein N-Myristoyl Classification by BONSAI with Insignificant Indexing Symbol
- A combined pathway to simulate CDK-dependent phosphorylation and ARF-dependent stabilization for p53 transcriptional activity
- A versatile petri net based architecture for modeling and simulation of complex biological processes
- XML documentation of biopathways and their simulations in Genomic Object Net
- Prediction of debacle points for robustness of biological pathways by using recurrent neural networks