Semi-supervised protein subcellular localization
Author Information
Author(s): Xu Qian, Hu Derek Hao, Xue Hong, Yu Weichuan, Yang Qiang
Primary Institution: Hong Kong University of Science and Technology
Hypothesis
Can we achieve the same or better prediction with fewer labeled data?
Conclusion
The study demonstrates that a semi-supervised learning approach can improve prediction accuracy for protein subcellular localization while using significantly fewer labeled instances.
Supporting Evidence
- The proposed method enhances the state-of-the-art prediction results of SVM classifiers by more than 10%.
- Using only about 20% of the labeled instances, the prediction accuracy exceeds 75%.
- The CoForest algorithm effectively utilizes unlabeled data to improve prediction accuracy.
Takeaway
This study shows a way to predict where proteins are located in cells using less labeled data, making it easier and faster to understand protein functions.
Methodology
The study uses a semi-supervised learning framework to refine a classifier with unlabeled data after initially training it on a small set of labeled examples.
Limitations
The study does not explore the effects of different values of labeled instances when training the SVM classifier.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website