Merging Microarray Data, Robust Feature Selection, and Predicting Prognosis in Prostate Cancer

Motivation Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies.Method We present a novel approach for combining microarray data across institutions and platforms.We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes.Results We combined two prostate cancer microarray data sets, confirmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fit test, and built several predictive models.

The best Womens Sandal logistic regression model with stepwise forward selection used 7 genes and had a misclassification rate of 31%.Models that combined LDA with different feature selection algorithms had misclassification rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation.When we combined RGFS with LDA, the best model used two genes and had a misclassification rate of 15%.Availability Affymetrix U95Av2 array data are available at http://www.

broad.mit.edu/cgi-bin/cancer/datasets.cgi.

The cDNA microarray data Mechanical 3D Puzzles are available through the Stanford Microarray Database ( http://cmgm.stanford.edu/pbrown/ ).GeneLink software is freely available at http://bioinformatics.

mdanderson.org/GeneLink/.DNA-Chip Analyzer software is publicly available at http://biosun1.harvard.

edu/complab/dchip/.

Leave a Reply

Your email address will not be published. Required fields are marked *