INSTALLATION 1. Type ./install to install SClassify. 2. Either move the executable file sclassify to a directory on the search path or add the current directory to the search path. INPUT The program assumes that e-values between each unclassified protein and each protein in existing families and e-values between each pair of unclassified proteins have already been obtained by other software such as BLAST or SSEARCH. The following files are needed: 1. A file that lists the name of each protein in existing families along with the name of its family in a two-column tab-separated format (example file: pfam.list). 2. A file that lists the name of each unclassified protein in a one-column format (example file: test.list). 3. A file that lists the e-values between each unclassified protein and each protein in existing families in a three-column tab-separated format that gives the name of an unclassified protein, the name of a protein in an existing family, and the e-value between them. There is no need to have an e-value for each pair if some of them are missing. The file is optional (example files: blast/test_pfam.score, ssearch/test_pfam.score). 4. A file that lists the e-values between each pair of unclassified proteins in a three-column tab-separated format that gives the names of two unclassified proteins and the e-value between them. There is no need to have an e-value for each pair if some of them are missing. The file is optional (example files: blast/test_test.score, ssearch/test_test.score). USAGE ./sclassify -c infile1 -u infile2 -p infile3 -n infile4 -e cutoff -o outfile where infile1 to infile4 are the input files described above, cutoff is the e-value cutoff, and outfile is the output file. EXAMPLES ./sclassify -c pfam.list -u test.list -p blast/test_pfam.score -n blast/test_test.score -e 0.1 -o test.out ./sclassify -c pfam.list -u test.list -p blast/test_pfam.score -e 1e-10 -o test.out ./sclassify -c pfam.list -u test.list -p ssearch/test_pfam.score -n ssearch/test_test.score -e 0.1 -o test.out ./sclassify -c pfam.list -u test.list -p ssearch/test_pfam.score -e 1e-10 -o test.out OUTPUT The output file is in a two-column tab-separated format that lists the name of each protein that is classified and the name of its assigned family. A distinct name is generated for each new family, and the same name is used for all proteins that are classified to the same family. SCRIPTS Two scripts are provided to convert the results from BLAST and from SSEARCH to a three-column tab-separated format. 1. BLAST converter Usage: python convert_blast.py infile outfile where infile contains the results from BLAST, and outfile is the output file. Examples: python convert_blast.py blast/test_pfam.blast blast/test_pfam.score python convert_blast.py blast/test_test.blast blast/test_test.score Note: If BLAST is applied with option -m 8, then there is no need to run the python script to convert the BLAST output. Example: blastall -p blastp -m 8 -d pfam -i test.fasta | cut -f 1,2,11 > blast/test_pfam.score 2. SSEARCH converter Usage: python convert_ssearch.py infile outfile where infile contains the results from SSEARCH, and outfile is the output file. Examples: python convert_ssearch.py ssearch/test_pfam.ssearch ssearch/test_pfam.score python convert_ssearch.py ssearch/test_test.ssearch ssearch/test_test.score