ISPAlign Software

ISPAlign (Intermediate Sequence Profile Alignment) is a multiple sequence alignment program that automatically performs database search from each input sequence, defines an appropriate subset of intermediate sequences from among the hits, and uses a greedy strategy to select a small subset of intermediate sequences that are far away from each other to add to the input sequences. By assigning each intermediate sequence to one of the sequences in the enlarged set of input sequences, a profile is constructed for each sequence in the enlarged set. A profile alignment of the enlarged sequence set is then performed by modifying the pair-HMM approach in ProbCons to incorporate profiles and utilize secondary structure predictions as in SPEM. Finally, a multiple alignment on the original input sequences is returned as output.

We test ISPAlign on benchmark multiple alignments including BAliBASE 3.0, HOMSTRAD, PREFAB 4.0, and SABmark 1.65, and show that it significantly outperforms MAFFT 5.8 and ProbCons 1.10, which are among the best multiple alignment algorithms that do not utilize additional hits from database search, and SPEM, which is among the best multiple alignment algorithms that utilize additional hits from database search.


Installing ISPAlign

The ISPAlign source code is available for download and can be compiled under the Unix/Linux environment. The source code includes versions of NCBI C ToolKit, SSEARCH, CD-HIT, PSIPRED and a modified version of ProbCons. The following steps will create a directory called ispalign. Further instructions are in the README file.

The results of ISPAlign on benchmark multiple alignments are also included for download. The following steps will create a directory called ispalign_results.


Reference

Lu Y. and Sze S.-H. (2008) Multiple sequence alignment based on profile alignment of intermediate sequences. Journal of Computational Biology, 15, 767-777. (Also appear in Proceedings of the 11th Annual International Conference on Research in Computational Molecular Biology (RECOMB'2007), Lecture Notes in Bioinformatics, 4453, 283-295.)