More information is available at http://faculty.cse.tamu.edu/shsze/gcfinder. INSTALLATION 1. Type ./install to install GCFinder. 2. Either move the executable files gcfinder, osynm and synm to a directory on the search path or add the current directory to the search path. INPUT The following files are needed: 1. A file specifying file names that define the homologous groups. Each organism starts with a line containing ">" and its name, followed by one or more lines that specify a file name for each chromosome. Example: >bsu bsu_cog >spy spy_cog >spn spn_cog >cac cac_cog 2. A file specifying file names that give the gene names. Each organism starts with a line containing ">" and its name, followed by one or more lines that specify a file name for each chromosome. The order of the file names should be the same as in 1. Example: >bsu bsu_gname >spy spy_gname >spn spn_gname >cac cac_gname 3. Files that define the homologous groups. In each row, the first number shows the position of the gene on the chromosome and the other numbers give the corresponding homologous group IDs. Example 1 (each gene belongs to one homologous group): 1 593 2 592 19 2812 20 718 21 353 25 4915 26 3853 27 1982 Example 2 (each gene may belong to many homologous groups): 38 73 143 39 84 40 3583 3584 41 1658 42 30 46 1947 47 503 48 251 49 2088 50 1207 51 462 4. Files that give the gene names. The first column shows the position of the gene on the chromosome and the second column gives the name of the gene. Example: 1 BSU00010 2 BSU00020 3 BSU00030 4 BSU00040 6 BSU00060 7 BSU00070 9 BSU00090 10 BSU00100 11 BSU00110 12 BSU00120 13 BSU00130 14 BSU00140 15 BSU00150 16 BSU00160 17 BSU00170 18 BSU00180 19 BSU00190 20 BSU00200 USAGE gcfinder -h=filelist -g=gfilelist -t=1 -u=1 -a=2 -d=50 -e=1e-5 -o=result Command line parameters: -h= "file name specifying files that define the homologous groups" -g= "file name specifying files that give the gene names" -t= "type of gene clusters" 0 -- ordered clusters 1 -- unordered clusters -u= "type of genomes" 0 -- linear 1 -- circular -a= "minimum number of genomes that a gene cluster must appear" -d= "maximum size of gene clusters" -e= "e-value cutoff, only gene clusters with lower e-value are returned" -o= "output file name" OUTPUT Each gene cluster is shown beginning with "Cluster:" and a list of homologous group IDs in the cluster. "Expect" gives the e-value. "Size" gives the average size of the cluster and the average number of genes in the chromosomes. "Appear" gives the number of chromosomes in which the cluster appears and the total number of chromosomes. The rest of the lines show specific genes on each chromosome. The leading string is the name of the organism. "Chr:" shows the chromosome number, "S" is the starting gene position, and "E" is the ending gene position. For each gene, the homologous group ID is given, and its position and name are shown in parentheses. Example: Cluster: 642 745 1136 Expect = 8.6456e-06, Size = 3/2894, Appear = 4/4 bsu_cog (Chr:1 S3301 E3326): 642 (3301, BSU33020) 642 (3320, BSU33210) 745 (3321, BSU33220) 1136 (3326, BSU33270) spy_cog (Chr:1 S1554 E1557): 642 (1554, SPy_2026) 745 (1555, SPy_2027) 1136 (1557, SPy_2031) spn_cog (Chr:1 S1525 E1546): 642 (1525, SP_1632) 745 (1526, SP_1633) 1136 (1546, SP_1653) cac_cog (Chr:1 S364 E366): 745 (364, CAC0371) 642 (365, CAC0372) 1136 (366, CAC0373) A list of maximal unordered gene clusters found on four bacterial genomes B. subtilis, S. pyogenes, S. pneumoniae and C. acetobutylicum is in bacteria/result, which is obtained by the command in bacteria/run. A list of maximal unordered gene clusters that appear in four yeast genomes S. cerevisiae, S. paradoxus, S. mikatae and S. bayanus is in yeast/result, which is obtained by the command in yeast/run. The maximum size of a gene cluster is constrained to be 50.