A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
2008

Comparing Random Forests and Support Vector Machines for Cancer Classification

Sample size: 22 publication 10 minutes Evidence: high

Author Information

Author(s): Alexander Statnikov, Lily Wang, Constantin F. Aliferis

Primary Institution: Vanderbilt University

Hypothesis

Methodological biases in prior work may have compromised the conclusions about the performance of random forests compared to support vector machines for microarray gene expression data classification.

Conclusion

Support vector machines outperform random forests in the majority of microarray datasets for cancer classification.

Supporting Evidence

  • Support vector machines significantly outperformed random forests in 7 datasets.
  • On average, support vector machines achieved higher AUC and RCI scores than random forests.
  • The study corrected methodological biases found in prior comparisons.

Takeaway

This study shows that when trying to classify cancer using gene data, one method called support vector machines works better than another method called random forests.

Methodology

The study used 22 diagnostic and prognostic datasets to compare the performance of support vector machines and random forests, applying rigorous statistical tests to evaluate their classification accuracy.

Potential Biases

Prior comparisons may have favored random forests due to methodological biases.

Limitations

The study focused solely on classification performance and did not consider the number of selected genes in the comparison metrics.

Participant Demographics

The datasets included a range of cancer types and varied in sample sizes from 50 to 308.

Statistical Information

P-Value

0.008

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-9-319

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication