This paper stresses the effectiveness of stochastic approach for analyzing genetic information such as DNA sequences and protein sequences. AI technologies, especially machine learning technologies, are very attractive to extract valuable information from the enormous amounts of raw genetic information generated by biologists. To achieve this, however, more flexible and robust learning methodologies are required to deal with divergence occurring on the genetic information. In this paper, we show how stochastic approach including stochastic knowledge representations and stochastic learning algorithms works for knowledge discovery from genetic information using a motif system as an example. The motif extraction system aims to extract stable common patterns (motifs) conserved in some protein category. In the system, motifs are regarded as stochastic rules (stochastic motifs) and a genetic algorithm with Rissanen's minimum description length (MDL) principle is used as a learning algorithm. The MDL principle enables us to select "good stochastic motifs" from the viewpoint of balancing the complexity of motif and its fitness to training data. This paper also mentions about the experience of extracting stochastic motifs from super families in protein data base (PIR), the comparison of the MDL principle and the maximum likelihood method in terms of genetic algorithms, and Hidden Markov Model (HMM) representation of stochastic motifs.
- 社団法人人工知能学会の論文
- 1993-07-01
