Structural diversity of biologically interesting datasets: a scaffold analysis approach
2011

Analyzing the Diversity of Biological Datasets for Drug Design

Sample size: 2000 publication 10 minutes Evidence: moderate

Author Information

Author(s): Varun Khanna, Shoba Ranganathan

Primary Institution: Macquarie University

Hypothesis

Are there any pharmaceutically relevant scaffolds or fragments present in metabolites and natural products that are missing in current lead libraries?

Conclusion

Current lead libraries do not utilize much of the scaffold space available in metabolites and natural products.

Supporting Evidence

  • Drugs and metabolites share 6% of the total non-redundant scaffolds.
  • Over 42% of the metabolite scaffolds are present in drugs.
  • Current lead libraries do not cover much of metabolite scaffold space.
  • Metabolites have a very narrow distribution of scaffolds.
  • Drugs are most diverse with 50% scaffolds relative to the dataset size.

Takeaway

This study looked at different types of biological data to find out how many useful building blocks for drugs are missing from current drug libraries. It found that many important building blocks from natural sources are not being used.

Methodology

The study compared five different types of biological datasets using various molecular descriptors and clustering techniques.

Potential Biases

There may be biases in the datasets due to overrepresentation of certain types of compounds.

Limitations

The analysis may be limited by the datasets used and the inherent biases in the chemical space representation.

Digital Object Identifier (DOI)

10.1186/1758-2946-3-30

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication