Let's look at a program that takes an unordered list of numbers from the standard input, puts them into an array, sorts the array, then prints the list in ascending numerical order. One way to write this program is to use a selection sort. This kind of sort searches the array for the minimum element, then swaps it with the #0 element. Then it searches for the minimum element from the #1 through the last element, swapping it with the #1 element, and so forth, until the whole array is in order:
#include <stdio.h> #include <stdlib.h> void selection_sort (float [], int); int read_numbers (float [], int); void swap (float [], int, int); int find_minimum (float [], int, int); /* main program */ int main () { int i; float w[1000], num; /* get some numbers into the array */ num = read_numbers (w, 1000); /* sort them */ selection_sort (w, num); /* print them */ for (i=0; i<num; i++) printf ("%f\n", w[i]); exit (0); } /* this function reads up to 'max' floats into 'v' */ int read_numbers (float v[], int max) { int i; i = 0; /* keep going until break */ for (;;) { /* get a number */ scanf ("%f", &v[i]); /* no more numbers? we're done. */ if (feof (stdin)) break; /* one more number */ i++; /* don't want to overflow array */ if (i >= max) { fprintf (stderr, "too many!\n"); exit (1); } } /* 'i' is now the number of floats read in */ return i; } /* this function does a selection sort on 'v' */ void selection_sort (float v[], int n) { int i; for (i=0; i<n; i++) swap (v, i, find_minimum (v, i, n)); } /* this function returns the index of the minimum element of v * from 'first' to 'last' */ int find_minimum (float v[], int first, int last) { int i, mini; /* mini tracks the lowest known element; currently the first */ mini = first; /* go through all the rest looking for a lower element */ for (i=first+1; i<last; i++) if (v[i] < v[mini]) mini = i; return mini; } /* this function exchanges the 'i'th and 'j'th elements of 'v' */ void swap (float v[], int i, int j) { float t; t = v[i]; v[i] = v[j]; v[j] = t; }
/* this function returns the index of an item in the array, * or -1 if the item isn't in the array */ int linear_search (float v[], int n, float target) { int i; for (i=0; i<n; i++) if (v[i] == target) return i; return -1; }How many comparisons will be done during this search in terms of the size of the array, n? For a successful search, n/2 comparisons will be performed on average, since we expect to find a randomly located item about halfway through the array. For an unsuccessful search, all n elements must be compared. Suppose instead of floats we were searching for a name in the telephone book. There might be 500,000 names in the book, so n=500,000. Do you normally look through around 250,000 names before you find the number? No; since the book is in sorted order, you can use a more efficient search to cut out most of the search space. Similarly, by splitting the search space in two parts each time we do a comparison, we can drastically reduce the number of comparisons made in a binary search:
int binary_search (int v[], int n, int target) { /* assumes v[] is in ascending sorted order */ int first, middle, last; /* 'first' and 'last' keep track of the section of the * array where we know (or suspect) target is */ first = 0; last = n; while (last-1 > first) { /* find the middle of the section of the array * between 'last' and 'first' */ middle = (first + last) / 2; if (v[middle] < target) /* value is in "upper" half */ first = middle; else if (v[middle] > target) /* value is in "lower" half */ last = middle; else return middle; } /* didn't return anything? then it must not be there. */ return -1; }This looks a lot longer, but turns out to be much more efficient. The section of the array to be searched is decreased by a factor of two each time through the while loop, resulting in a logarithmic rather than linear number of comparisons performed. Consider the following table of number of comparisons for linear search and binary search for different sized arrays:
Size of Array Linear Search (average case) Binary Search ------------- ---------------------------- ------------- 16 items 8 comparisons 4 comparisons 64 items 32 comparisons 6 comparisons 256 items 128 comparisons 8 comparisons 65536 items 32768 comparisons 16 comparisons 4000000000 2000000000 comparisons 32 comparisonsClearly, binary search is better than linear search, especially for large arrays. However, if the data is initially unsorted, we have to first sort it before we can use binary search. Sometimes it is worth it, sometimes it isn't. Look at selection sort above and figure out how many comparisons it takes. It takes a large fraction of n squared comparisons. There is a C function called qsort that takes only n log n comparisons, but that is still a lot if we are only going to perform a few searches. Modern database systems go to a lot of trouble to keep data in sorted order so that searches will be fast, but sorting the whole database doesn't need to be done.