Evaluating Automated Classification of Human Disease Genes
Author Information
Author(s): Chen James L, Liu Yang, Sam Lee T, Li Jianrong, Lussier Yves A
Primary Institution: Georgetown University Hospital
Hypothesis
We hypothesized that a manual selection of GO classes homologous to those of Valle's categories of protein functions would recapitulate Valle classification of genes.
Conclusion
Automated methods can recapitulate a significant portion of classification of the human disease genes.
Supporting Evidence
- The automated methods achieved an overall 56% and 47% precision with 62% and 71% recall respectively.
- For some protein function categories, such as 'hormone' and 'transcription factor', the automated methods performed particularly well, achieving precision and recall levels above 75%.
- The study highlights the need for significant progress in Gene Ontology annotations for better classification.
Takeaway
The study looked at how well computers can sort human disease genes into categories using existing data, and found that they can do a pretty good job.
Methodology
Two automated methods were applied: one used Gene Ontology structure to classify genes, and the other used information-theoretic distance to cluster genes.
Limitations
About 15% of the studied human disease genes remain without GO annotations, and some categories performed poorly due to ambiguity.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website