ClustDB: A Tool for Fast Sequence Matching
Author Information
Author(s): Kleffe Jürgen, Möller Friedrich, Wittig Burghardt
Primary Institution: Institut für Molekularbiologie und Bioinformatik, Charite-Universitätsmedizin Berlin
Hypothesis
Can a new algorithm improve the identification of long similar substrings in large sets of sequences?
Conclusion
ClustDB is an efficient tool for finding long sections of similar sequences and detecting systematic errors in genomic data.
Supporting Evidence
- ClustDB can handle 16 times more data than the previous best program, VMATCH.
- The program took less than four hours to compare 3.3 GB of human ESTs.
- ClustDB identified 1215 complete matches in a set of 2020 Medicago truncatula BACs.
Takeaway
ClustDB helps scientists find similar DNA sequences quickly, which is important for understanding genes and fixing mistakes in genetic data.
Methodology
ClustDB uses a novel algorithm for match extension with errors and a partitioned suffix array method to compare large sets of sequences.
Limitations
ClustDB may not handle low complexity sequences well, leading to meaningless matches.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website