Finding Functional Features of Proteins using Machine Learning Techniques
スポンサーリンク
概要
- 論文の詳細を見る
Protein function prediction from amino-acid sequences is one of the major tasks in genome informatics.To predict protein functions of a given amino-acid sequence, we can use similarities amongfunctions and structural features of amino-acid sequences, i.e., motif and homology. Difficulties of theprevious function prediction methods are caused by the facts that few already known motif have beenfound and that proteins of similar sequence may not have similar functions. A main objective of ourresearch is to facilitate to find functional features of proteins using machine learning techniques.<BR>Our hypothesis for the protein function prediction is that a protein function arises from physicalstructures of the protein. Since the structures of proteins are built with physico-chemical interactionsamong amino-acids, there might exist some features of amino-acid sequences according to the physicochemicalinteractions. We call these features 'functional features'. We know that there exists electricinteractions among alpha-helices of bacteriorhodopsin from its tertiary structure of the protein andlocalization of polar amino-acids in the structure. If the amino-acids localization of bacteriorhodopsinis closely related to the function of the protein, we can use this functional feature to predict proteinfunction.<BR>To create rules to predict protein functions, we use the three machine learning techniques (Fig.1). The first technique is analogical reasoning to make a assumptions about functional features. Forexample, if there exists localization of polar amino-acids in some proteins, then the localization mightimply relation between the functional features and functions of the protein, using analogical reasoningfrom the fact about bacteriorhodopsin. The second technique is inductive reasoning to generalize thehypothesis made by analogical reasoning. The goal of the inductive reasoning for protein functionprediction is to decide which localization pattern is most useful to classify protein functions. Thethird technique is deductive reasoning to refine the localization pattern into classification rules. Inthe deductive reasoning, knowledge about protein functions and structures are used to make logicaldescription of classification rules.<BR>We have carried out some experiments to implement our idea to find functional features of proteinsusing machine learning techniques. First we have simulated analogical reasoning process tocreate a hypothesis about functional features of bacteriorhodopsin using ABA framework proposedby authors [1]. In the current stage of our research, this analogical reasoning process is executed byhand simulation, but it will be executed on a computer in the next stage. Next we have analyzedthe relation between the functional features and protein protein functions of seven-helices membraneproteins using a cluster analysis method. From this analysis, we have found that amino-acid intervalfrequencies for polar amino-acids is closely related to some function classes of the classified proteins.The feature of the amino-acid interval frequencies is thought to be a representation of the abstractfunctional feature: 'localization of amino-acids'. From the result of this cluster analysis, we can usethe functional features for the inductive reasoning in the next step.<BR>In the preliminary experiments described above, we have found new functional features to classifyprotein functions from amino-acid sequences. Specifically, these features can discriminate differentfunctions of proteins that have similar amino-acid sequences in homology analysis. Furthermore, thefeatures can recognize same function proteins that have not similar sequences. From these results westate that our idea is useful to predict protein functions. In the next stage of the research, we have aplan to refine classification rules and to integrate three machine learning techniques.
- 日本バイオインフォマティクス学会の論文
日本バイオインフォマティクス学会 | 論文
- Performance Improvement in Protein N-Myristoyl Classification by BONSAI with Insignificant Indexing Symbol
- A combined pathway to simulate CDK-dependent phosphorylation and ARF-dependent stabilization for p53 transcriptional activity
- A versatile petri net based architecture for modeling and simulation of complex biological processes
- XML documentation of biopathways and their simulations in Genomic Object Net
- Prediction of debacle points for robustness of biological pathways by using recurrent neural networks