Analyzing the Diversity of Biological Datasets for Drug Design
Author Information
Author(s): Varun Khanna, Shoba Ranganathan
Primary Institution: Macquarie University
Hypothesis
Are there any pharmaceutically relevant scaffolds or fragments present in metabolites and natural products that are missing in current lead libraries?
Conclusion
Current lead libraries do not utilize much of the scaffold space available in metabolites and natural products.
Supporting Evidence
- Drugs and metabolites share 6% of the total non-redundant scaffolds.
- Over 42% of the metabolite scaffolds are present in drugs.
- Current lead libraries do not cover much of metabolite scaffold space.
- Metabolites have a very narrow distribution of scaffolds.
- Drugs are most diverse with 50% scaffolds relative to the dataset size.
Takeaway
This study looked at different types of biological data to find out how many useful building blocks for drugs are missing from current drug libraries. It found that many important building blocks from natural sources are not being used.
Methodology
The study compared five different types of biological datasets using various molecular descriptors and clustering techniques.
Potential Biases
There may be biases in the datasets due to overrepresentation of certain types of compounds.
Limitations
The analysis may be limited by the datasets used and the inherent biases in the chemical space representation.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website