Automated Alphabet Reduction for Protein Datasets
Author Information
Author(s): Jaume Bacardit, Michael Stout, Jonathan D Hirst, Alfonso Valencia, Robert E Smith, Natalio Krasnogor
Primary Institution: University of Nottingham
Hypothesis
Can automated and generic alphabet reduction techniques improve protein structure prediction without losing key biochemical information?
Conclusion
The automated alphabet reduction protocol generates competent reduced alphabets tailored specifically for a variety of protein datasets without requiring domain knowledge.
Supporting Evidence
- The five-letter alphabets gave prediction accuracies statistically similar to that obtained using the full amino acid alphabet.
- The automatically designed alphabets outperformed other reduced alphabets taken from the literature.
- The performance gap between the full representation and the reduced representation was small.
Takeaway
This study shows a way to simplify the letters used to represent proteins, making it easier to analyze them while still keeping important information.
Methodology
The study used an automated method to perform alphabet reduction, applying it to predict contact number and relative solvent accessibility in proteins.
Limitations
The study's findings may not generalize to all protein structure prediction problems, and the performance of reduced alphabets can vary based on the specific dataset.
Participant Demographics
The dataset included 1050 protein chains with specific selection criteria.
Statistical Information
P-Value
p<0.05
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website