Estimating Error Rates in Gene Ontology Annotations
Author Information
Author(s): Craig E. Jones, Alfred L. Brown, Ute Baumann
Primary Institution: University of Adelaide
Hypothesis
What is the error rate of curated Gene Ontology (GO) sequence annotations?
Conclusion
The overall error rate of curated GO term annotations is estimated to be between 28% and 30%, with non-ISS annotations having a lower error rate than ISS annotations.
Supporting Evidence
- The error rate of curated non-ISS annotations was found to be between 13% and 18%.
- ISS annotations had a significantly higher error rate of 49%.
- The study provides a systematic approach to estimating annotation errors in sequence databases.
Takeaway
This study looked at how often mistakes are made in labeling genes and found that about 3 out of 10 labels might be wrong, especially if they were based on similarities to other genes.
Methodology
The study developed a method to estimate error rates by adding known errors to annotations and analyzing the impact on precision using regression.
Potential Biases
There is a risk of bias in the estimation methods due to the reliance on curated sources and the assumptions made about error propagation.
Limitations
The study relies on assumptions about the distribution of errors and semantic variations, which may not fully capture the complexity of annotation errors.
Statistical Information
P-Value
p<0.001
Statistical Significance
p<0.001
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website