MotifEnumerator Software

MotifEnumerator is an improved pattern-driven algorithm for motif finding in DNA sequences proposed in Sze and Zhao (2006).

Depending on whether mismatches and don't cares are allowed, MotifEnumerator has different time and space complexities when l is large enough:

where k is the number of sequences, n is the length of each sequence and l is the motif length.


Using MotifEnumerator

The MotifEnumerator source code consists of a single file motifenumerator.c. It can be compiled under the Unix/Linux/Windows(Cygwin) environment with the command "gcc -O3 -o motifenumerator motifenumerator.c".

MotifEnumerator needs four command-line parameters in the following order:

The input sample (from stdin) consists of a set of sequences in FASTA format, while the output (to stdout) shows a set of non-overlapping motifs with e-value below 1.0.

The occurrences of each motif are displayed in the format "string seq/pos", where "string" is the motif occurrence, "seq" is the sequence name, and "pos" is the position within the sequence ('-' means reverse strand, while '+' denotes occurrences added after refinement).

For each motif, an additional line following the occurrences displays the motif pattern before refinement (with '-' denoting don't cares), the value of d, and the e-value.

Examples (l=12, num_strand=1, and input sample.fasta):


Reference

Sze S.-H. and Zhao X. (2006) Improved pattern-driven algorithms for motif finding in DNA sequences. Proceedings of the 2005 Joint RECOMB Satellite Workshops on Systems Biology and Regulatory Genomics, Lecture Notes in Bioinformatics, 4023, 198-211.