Simultaneous identification of long similar substrings in large sets of sequences

2007

ClustDB: A Tool for Fast Sequence Matching

Sample size: 1377 publication Evidence: high

Author Information

Author(s): Kleffe Jürgen, Möller Friedrich, Wittig Burghardt

Primary Institution: Institut für Molekularbiologie und Bioinformatik, Charite-Universitätsmedizin Berlin

Can a new algorithm improve the identification of long similar substrings in large sets of sequences?

ClustDB is an efficient tool for finding long sections of similar sequences and detecting systematic errors in genomic data.

ClustDB can handle 16 times more data than the previous best program, VMATCH.
The program took less than four hours to compare 3.3 GB of human ESTs.
ClustDB identified 1215 complete matches in a set of 2020 Medicago truncatula BACs.

ClustDB helps scientists find similar DNA sequences quickly, which is important for understanding genes and fixing mistakes in genetic data.

ClustDB uses a novel algorithm for match extension with errors and a partitioned suffix array method to compare large sets of sequences.

ClustDB may not handle low complexity sequences well, leading to meaningless matches.

Access the complete publication on the publisher's website