Lecture 10

Stable Sorting Algorithms

When we sort elements in an array, we are often sorting records by a key, the value upon which there is an order. Other data in the record is satellite information. For example, consider a bank that records transactions with a record like this:
struct transaction {
	int	account_number,		/* number of the account */
		check_number;		/* number of the check */
	double	amount;			/* amount of transaction in dollars */
	date_t	date;			/* timestamp */
};
During the month, the bank might receive transactions, process them, and store them in a big array. Each month, the bank might sort the records using the account number as the key, then divide up the array among account numbers and send each customer his or her statement.

The bank would like for each record on a customers statement to appear in the order it was processed, i.e., the order the records occurred with repsect to each other in the original array. With a sort like Quicksort, where elements may be swapped across vast distances of the array, this order may be lost.

A stable sort is one that leave elements with the same key in the same order the occurred in the original array. This kind of sort is needed in the bank scenario above. If one is simply sorting numbers, it doesn't matter whether one uses a stable sort or not, but if satellite information is being carried around, it may matter.

The version of counting sort presented in class is not stable; it can't even be applied to the case of sorting records. However, the counting sort given in the book is stable.

What other sorts are stable?

Notably missing are Quicksort and heap sort; these are not stable. Note also that the easy-to-program "binary tree" sort, where you insert the elements into a binary search tree, then do an inorder traversal, is stable only if we insist on going right whenever inserting an element whose key is equal to that of the current node.

Radix Sort

Stability of a sort is also important for a linear-time sorting algorithm called radix sort. Radix sort is a generalized version of counting sort. Radix sort can sort an array of strings in lexicographic order (similar to alphabetic order, but with ASCII or digits). The radix sort algorithm is stated very simply (d is the number of characters or digits per string):
Radix-Sort (A, d)
	for i in 1..d do
		use a stable sort to sort array A on character position i
	end for
This is assuming that the least significant character is in the first position. The strings are sorted first by the least significant character, then by the next significant character, and so forth, until we reach the most significant character. It is important that we use a stable sort so that the order of the first sort is preserved when there are duplicate characters in the second sort and so forth.

Let's look at an example of sorting integers represented as decimal strings. Remember that the least significant digit is the last in the decimal number because we write numbers from left to write:

A:
7349
9124
3978
7457
8565
sort on digit 0:
9124
8565
7457
3978
7349
sort on digit 1:
9124
7349
7457
8565
3978
sort on digit 2:
9124
7349
7457
8565
3978
sort on digit 3:
3978
7349
7457
8565
9124

Analysis

How long does radix sort take? There are now three parameters to this question: n: the size of the array, k: the size of the "alphabet" (character set), and now l, the length of each string. Since counting sort takes (k + n), which is (n) for small k, and we do l counting sorts, radix sort takes (n l). If we can count on our strings to be of small constant size, this is just (n). The asymptotic analysis is valid, but for strings of say, length 80, with 256 possible characters in each position of each string, the value for n0 for which the actual running time of an O(n ln n) sort exceeds that of radix sort may be very large. In practice, radix sort is often much faster than Quicksort etc. for specialized data like short strings.

The book's version of radix sort is somewhat unsatisfying. Here is a C version of radix sort. The program takes as input a file of strings and sorts them using radix sort. The most significant character is the first in the string, unlike the book's version:

/* radix sort */

#include <stdio.h>
#include <string.h>

#define LEN		72	/* maximum string length */
#define K		256	/* number of possible characters */
#define MAX_STRINGS	100000

/* this version of counting sort sorts an array A of pointers to strings using
 * the h'th character as the key.  the sorted pointers are returned in the
 * B array.  this is a stable sort.
 */

void counting_sort (unsigned char *A[], unsigned char *B[], int n, int h) {
	int		C[K];
	int		i, j;

	for (i=0; i<K; i++) C[i] = 0;

	/* all counts are now zero */

	for (j=0; j<n; j++) C[A[j][h]]++;

	/* C[i] is number of times character i occurs */

	for (i=1; i<K; i++) C[i] += C[i-1];

	/* C[i] is number of times i or less occurs */

	for (j=n-1; j>=0; j--) {

		/* place elements in the array from largest to smallest */
		/* -1 is because arrays start out at 0 in C */

		B[C[A[j][h]]-1] = A[j];
		C[A[j][h]]--;
	}
}

/* this radix sort sorts the n pointers to strings of size d in the array A 
 * by the strings they point to. 
 */
void radix_sort (unsigned char *A[], int n, int d) {
	int		i, j;
	unsigned char	*B[MAX_STRINGS];

	/* we're assuming here that digit 0 is the highest order digit,
	 * like in real life, not like in the book 
	 */
	for (i=d-1; i>=0; i--) {

		/* stable sort A into B */

		counting_sort (A, B, n, i);

		/* copy the results back into A */

		for (j=0; j<n; j++) A[j] = B[j];
	}
}


/* main program */

int main () {
	unsigned char	A[MAX_STRINGS][LEN+1], 
			*Ap[MAX_STRINGS], s[1000];
	int		i, n;

	/* get a bunch of strings into the array A */
	
	for (n=0;;) {
		gets (s);
		if (feof (stdin)) break;

		/* make sure the string is no longer than LEN */

		s[LEN] = 0;

		/* make sure the string is no shorter than LEN */

		while (strlen (s) < LEN) strcat (s, " ");

		/* put the string into the next position in A */

		strcpy (A[n++], s);
	}

	/* make Ap an array of pointers to the strings in A */

	for (i=0; i<n; i++) Ap[i] = A[i];

	/* sort the pointers */

	radix_sort (Ap, n, LEN);

	/* print them out */

	for (i=0; i<n; i++) printf ("%s\n", Ap[i]);
}