Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
2007

Indexing Strategies for Rapid Searches of Short Words in Genome Sequences

Sample size: 604258 publication Evidence: high

Author Information

Author(s): Iseli Christian, Ambrosini Giovanna, Bucher Philipp, Jongeneel C. Victor

Primary Institution: Ludwig Institute for Cancer Research

Hypothesis

We investigated the performance of simple indexing strategies for handling searches of short words in genome sequences.

Conclusion

The fetchGWI program outperforms megablast for searches with more than 10,000 probes.

Supporting Evidence

  • FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes.
  • The study compared the performance of fetchGWI and tagger against megablast.
  • The results indicate that a compressed sorted word-index accessed by dichotomic search outperforms other approaches.

Takeaway

This study created tools to quickly find short DNA sequences in large genome databases, making it easier for scientists to analyze genetic information.

Methodology

The study developed two programs, fetchGWI and tagger, to index and search short DNA sequences in genome databases.

Limitations

The performance of fetchGWI is limited by the speed of access to the filesystem.

Digital Object Identifier (DOI)

10.1371/journal.pone.0000579

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication