Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
2007
Indexing Strategies for Rapid Searches of Short Words in Genome Sequences
Sample size: 604258
publication
Evidence: high
Author Information
Author(s): Iseli Christian, Ambrosini Giovanna, Bucher Philipp, Jongeneel C. Victor
Primary Institution: Ludwig Institute for Cancer Research
Hypothesis
We investigated the performance of simple indexing strategies for handling searches of short words in genome sequences.
Conclusion
The fetchGWI program outperforms megablast for searches with more than 10,000 probes.
Supporting Evidence
- FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes.
- The study compared the performance of fetchGWI and tagger against megablast.
- The results indicate that a compressed sorted word-index accessed by dichotomic search outperforms other approaches.
Takeaway
This study created tools to quickly find short DNA sequences in large genome databases, making it easier for scientists to analyze genetic information.
Methodology
The study developed two programs, fetchGWI and tagger, to index and search short DNA sequences in genome databases.
Limitations
The performance of fetchGWI is limited by the speed of access to the filesystem.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website