ABSTRACT
With the advent of inexpensive microarray technology, biologists have become increasingly reliant on gene expression analysis for detecting disease states, including diagnosis of cancerous tissue [12]. While random forests and SVMs have proven to be popular methods for expression analysis, little work has been done to compare these methods with AdaBoost, a popular ensemble learning algorithm, across a wide array of cancer prediction tasks. Our work shows AdaBoost outperforms other approaches on binary predictions while random forests and SVMs are the best choice in multi-class predictions.
- L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
- R. Díaz-Uriarte and S. A. De Andres. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1):3, 2006.Google Scholar
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997. Google ScholarDigital Library
- T. S. Furey, N. Cristianini, N. Duffy, et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16.10:906--914, 2000.Google ScholarCross Ref
- N. Iizuka, M. Oka, H. Yamada-Okabe, et al. Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet, 361(9361):923--929, 2003.Google ScholarCross Ref
- R. Kuner, T. Muley, M. Meister, et al. Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. Lung Cancer, 63(1):32--38, 2009.Google ScholarCross Ref
- A. Naderi, A. Teschendorff, N. Barbosa-Morais, et al. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene, 26(10):1507--1516, 2007.Google ScholarCross Ref
- D. T. Ross, U. Scherf, M. B. Eisen, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24:227--235, 2000.Google ScholarCross Ref
- A. Sanchez-Palencia, M. Gomez-Morales, J. A. Gomez-Capilla, et al. Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer. International Journal of Cancer, 129(2):355--364, 2011.Google Scholar
- A. Statnikov, C. F. Aliferis, I. Tsamardinos, et al. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5):631--643, 2005. Google ScholarDigital Library
- A. Statnikov, L. Wang, and C. F. Aliferis. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics, 9(1):319, 2008.Google Scholar
- A. C. Tan and D. Gilbert. Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics, 2:S75--83, 2003.Google Scholar
- E. Tian, F. Zhan, R. Walker, et al. The role of the wnt-signaling antagonist dkk1 in the development of osteolytic lesions in multiple myeloma. New England Journal of Medicine, 349(26):2483--2494, 2003.Google ScholarCross Ref
- V. N. Vapnik. Statistical Learning Theory, volume 1. Wiley New York, 1998.Google ScholarDigital Library
Index Terms
- A comprehensive analysis of classification algorithms for cancer prediction from gene expression
Recommendations
Cancer classification using gene expression data
Special issue: Data management in bioinformaticsThe classification of different tumor types is of great importance in cancer diagnosis and drug discovery. However, most previous cancer classification studies are clinical based and have limited diagnostic ability. Cancer classification using gene ...
Statistical methods for gene set co-expression analysis
Motivation: The power of a microarray experiment derives from the identification of genes differentially regulated across biological conditions. To date, differential regulation is most often taken to mean differential expression, and a number of ...
Cancer classification from serial analysis of gene expression with event models
Cancer class prediction and discovery is beneficial to imperfect non-automated cancer diagnoses which affect patient cancer treatments. Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is ...
Comments