Learning similarity functions for multi-platform gene expression data

概要

論文の詳細を見る
The existence of several technologies for measuring gene expression and the growing number of available large-scale gene expression microarrays motivate the need for cross-platform analysis tools. Cross-platform analysis of microarray data is an important problem, which heavily relies on the choice of a similarity function. For a classification task, a good similarity function should improve the prediction performance. It should also be easy to compute, and provide new biological insights of the data. However in practice, choosing a good similarity function for multi-platform microarray data is a difficult problem. In this work, our goal is to improve the performance of microarray search engines such as CellMontage. Therefore, we focus the ranking task rather than the classification task. Our ranking-based approach compares favourably to several similarity functions, including the Pearson and Spearman Correlation coefficients, the Euclidean distance, Linear Discriminant Analysis, and Neighbourhood Component Analysis. Experiments show that our method can be used to differentiate different types of cells with high accuracy, including induced pluripotent stem cells, embryonic stem cells, and cancer cells.
2011-03-03