Automating document classification for the Immune Epitope Database
2007

Automating Document Classification for the Immune Epitope Database

Sample size: 20910 publication Evidence: high

Author Information

Author(s): Wang Peng, Morgan Alexander, Zhang Qing, Sette Alessandro, Peters Bjoern

Primary Institution: The La Jolla Institute for Allergy and Immunology

Hypothesis

Can automated text classification improve the efficiency of the Immune Epitope Database curation process?

Conclusion

The implementation of text classification has sped up the reference selection process without sacrificing sensitivity or specificity.

Supporting Evidence

  • The Naïve Bayes classifier achieved 95% sensitivity and specificity in classifying abstracts.
  • Using additional features from PubMed improved the classifier's performance.
  • The study provides a large dataset that can serve as a benchmark for tool developers.

Takeaway

This study shows how computers can help scientists quickly find important information in research papers about immune responses.

Methodology

Naïve Bayes classifiers were trained on 20,910 abstracts classified by domain experts to automate the classification of abstracts as relevant, irrelevant, or uncertain.

Potential Biases

The study may be biased by the reliance on expert classifications for training the classifier.

Limitations

The inherent disagreement rate in expert classifications limits the accuracy of the automated classification.

Participant Demographics

The dataset consists of PubMed abstracts related to immune epitopes, classified by domain experts.

Statistical Information

P-Value

0.0021

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-8-269

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication