Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
2011

BioStor: A Tool for Finding Articles in the Biodiversity Heritage Library

Sample size: 26784 publication Evidence: moderate

Author Information

Author(s): Roderic DM Page

Primary Institution: University of Glasgow

Conclusion

BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library.

Supporting Evidence

  • BioStor is available from http://biostor.org/.
  • The BHL archive comprises over 31 million pages scanned from books, monographs, and journals.
  • BioStor uses a service provided by bioGUID to find the ISSN for the journal.

Takeaway

BioStor helps people find and access scientific articles in a huge library of old biological literature. It's like a search engine just for articles!

Methodology

A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment.

Potential Biases

The literature in BHL is a biased sample of the taxonomic literature.

Limitations

The ability to find articles depends on the quality of the metadata and OCR text, which can contain errors.

Digital Object Identifier (DOI)

10.1186/1471-2105-12-187

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication