Multi-label literature classification based on the Gene Ontology graph
2008

Improving Literature Classification with Gene Ontology

Sample size: 36423 publication Evidence: high

Author Information

Author(s): Jin Bo, Muller Brian, Zhai Chengxiang, Lu Xinghua

Primary Institution: Medical University of South Carolina

Hypothesis

Can graph-based multi-label classification methods enhance the automatic annotation of biomedical literature using the Gene Ontology graph?

Conclusion

Graph-based multi-label classification methods significantly outperform conventional flat multi-label classification approaches for protein annotation based on literature.

Supporting Evidence

  • Graph-based methods significantly improve predictions of Gene Ontology terms.
  • The study utilized a dataset of 36,423 MEDLINE entries for evaluation.
  • Graph-based classifiers can suggest annotations closely related to true annotations.

Takeaway

This study shows that using a special graph structure helps computers better understand and classify scientific papers about proteins, making it easier to label them correctly.

Methodology

The study evaluated three graph-based multi-label classification algorithms against a conventional flat multi-label algorithm using a dataset of biomedical literature.

Limitations

The methods may require further improvement to meet real-world annotation needs and rely on the quality of training data.

Digital Object Identifier (DOI)

10.1186/1471-2105-9-525

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication