Clustering Protein Environments for Function Prediction
Author Information
Author(s): Yoon Sungroh, Ebert Jessica C, Chung Eui-Young, De Micheli Giovanni, Altman Russ B
Primary Institution: Stanford University
Hypothesis
There are a large number of sites in proteins that are associated with function that have not yet been recognized.
Conclusion
The study successfully demonstrates that protein environments can be clustered to identify novel structural or functional sites.
Supporting Evidence
- The clustering method successfully rediscovered known 3D environments associated with PROSITE motifs.
- The study defined 4,550 clusters from nearly 2 million environments.
- The clustering algorithm was able to detect clusters that capture known 1D motifs from PROSITE.
Takeaway
The researchers found a way to group similar parts of proteins together, which helps in figuring out what those parts do, even if we didn't know about them before.
Methodology
The study used K-means clustering on nearly 2 million protein microenvironments to identify clusters associated with known PROSITE motifs.
Potential Biases
The reliance on known PROSITE motifs may introduce bias in identifying novel sites.
Limitations
The study primarily focused on known motifs and may not capture all functional sites.
Participant Demographics
The study analyzed 9,600 nonredundant protein chains from the Protein Data Bank.
Statistical Information
P-Value
0.02
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website