Semi-supervised protein subcellular localization
2009

Semi-supervised protein subcellular localization

Sample size: 7579 publication Evidence: high

Author Information

Author(s): Xu Qian, Hu Derek Hao, Xue Hong, Yu Weichuan, Yang Qiang

Primary Institution: Hong Kong University of Science and Technology

Hypothesis

Can we achieve the same or better prediction with fewer labeled data?

Conclusion

The study demonstrates that a semi-supervised learning approach can improve prediction accuracy for protein subcellular localization while using significantly fewer labeled instances.

Supporting Evidence

  • The proposed method enhances the state-of-the-art prediction results of SVM classifiers by more than 10%.
  • Using only about 20% of the labeled instances, the prediction accuracy exceeds 75%.
  • The CoForest algorithm effectively utilizes unlabeled data to improve prediction accuracy.

Takeaway

This study shows a way to predict where proteins are located in cells using less labeled data, making it easier and faster to understand protein functions.

Methodology

The study uses a semi-supervised learning framework to refine a classifier with unlabeled data after initially training it on a small set of labeled examples.

Limitations

The study does not explore the effects of different values of labeled instances when training the SVM classifier.

Digital Object Identifier (DOI)

10.1186/1471-2105-10-S1-S47

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication