Comparing Random Forests and Support Vector Machines for Cancer Classification
Author Information
Author(s): Alexander Statnikov, Lily Wang, Constantin F. Aliferis
Primary Institution: Vanderbilt University
Hypothesis
Methodological biases in prior work may have compromised the conclusions about the performance of random forests compared to support vector machines for microarray gene expression data classification.
Conclusion
Support vector machines outperform random forests in the majority of microarray datasets for cancer classification.
Supporting Evidence
- Support vector machines significantly outperformed random forests in 7 datasets.
- On average, support vector machines achieved higher AUC and RCI scores than random forests.
- The study corrected methodological biases found in prior comparisons.
Takeaway
This study shows that when trying to classify cancer using gene data, one method called support vector machines works better than another method called random forests.
Methodology
The study used 22 diagnostic and prognostic datasets to compare the performance of support vector machines and random forests, applying rigorous statistical tests to evaluate their classification accuracy.
Potential Biases
Prior comparisons may have favored random forests due to methodological biases.
Limitations
The study focused solely on classification performance and did not consider the number of selected genes in the comparison metrics.
Participant Demographics
The datasets included a range of cancer types and varied in sample sizes from 50 to 308.
Statistical Information
P-Value
0.008
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website