Some string-processing programs take other programs as input. One such program is the C compiler, cc, that you use to translate your C programs to machine language. The compiler accepts as input a file full of strings, and produces machine language output. Let's look at a less ambitious program that does one of the steps of the C compiler: a program to remove comments from a C program. This program will read in a C program from the standard input and print out an equivalent C program on the standard output with all the comments removed:
#include <stdio.h> int main () { char s[1000]; int i, commenting; /* currently, "commenting" is false, i.e., we're not in a comment */ commenting = 0; for (;;) { /* get a string from the file */ fgets (s, 1000, stdin); /* if we're at the end of file, get out of the loop */ if (feof (stdin)) break; /* for each character in the string... */ for (i=0; s[i]; i++) { /* if we aren't yet commenting and see "/*", * then begin we know we're starting a comment */ if (!commenting && s[i] == '/' && s[i+1] == '*') commenting = 1; /* if we're not commenting, print the current char */ if (!commenting) putchar (s[i]); /* if we are currently commenting and see * "(you know)", end commenting and increment * i past the / */ if (commenting && s[i] == '*' && s[i+1] == '/') { commenting = 0; i++; } } } /* if we're still in a comment, warn the user */ if (commenting) fprintf (stderr, "unfinished comment!\n"); exit (0); }
#include <stdio.h> #include <string.h> /* * This function returns the position of a substring in a string */ int substring_position (char haystack[], char needle[]) { int i, len; /* find the length of the thing we're looking for */ len = strlen (needle); /* search at each position in the string */ for (i=0; haystack[i]; i++) { /* if we find it, return the position */ if (strncmp (&haystack[i], needle, len) == 0) return i; } /* didn't find it? return -1. */ return -1; } int main (int argc, char *argv[]) { char s[1000]; int line_number; /* the command line argument is the string to search for */ if (argc != 2) { fprintf (stderr, "Usage: %sThis program uses a trick we haven't discussed much in class yet; the line where it says:\n"); exit (1); } /* start out at line number 1 */ line_number = 1; /* while not at end of file, get some strings */ for (;;) { /* read in a string */ fgets (s, 1000, stdin); /* one more line... */ line_number++; /* end of file? */ if (feof (stdin)) break; /* if we find the substring... */ if (substring_position (s, argv[1]) != -1) { /* print the line number and string */ printf ("%d: %s", line_number, s); } } exit (0); }
if (strncmp (&haystack[i], needle, len) == 0) ...The &haystack[i] part essentially says "treat the i'th position of haystack as the first position of an array, and pass that array to strncmp." Really, we are passing the address of haystack[i] to strncmp, which will treat that address as a pointer to the first element of an array; the address after that is the second element, and so forth. We can't get very far with strings without talking about pointers, so that's next...