Estimating the annotation error rate of curated GO database sequence annotations
2007

Estimating Error Rates in Gene Ontology Annotations

Sample size: 59251 publication Evidence: moderate

Author Information

Author(s): Craig E. Jones, Alfred L. Brown, Ute Baumann

Primary Institution: University of Adelaide

Hypothesis

What is the error rate of curated Gene Ontology (GO) sequence annotations?

Conclusion

The overall error rate of curated GO term annotations is estimated to be between 28% and 30%, with non-ISS annotations having a lower error rate than ISS annotations.

Supporting Evidence

  • The error rate of curated non-ISS annotations was found to be between 13% and 18%.
  • ISS annotations had a significantly higher error rate of 49%.
  • The study provides a systematic approach to estimating annotation errors in sequence databases.

Takeaway

This study looked at how often mistakes are made in labeling genes and found that about 3 out of 10 labels might be wrong, especially if they were based on similarities to other genes.

Methodology

The study developed a method to estimate error rates by adding known errors to annotations and analyzing the impact on precision using regression.

Potential Biases

There is a risk of bias in the estimation methods due to the reliance on curated sources and the assumptions made about error propagation.

Limitations

The study relies on assumptions about the distribution of errors and semantic variations, which may not fully capture the complexity of annotation errors.

Statistical Information

P-Value

p<0.001

Statistical Significance

p<0.001

Digital Object Identifier (DOI)

10.1186/1471-2105-8-170

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication