CPSC 120 lecture notes for 3/23/98

CPSC 120 Lecture Notes, Monday, March 23, 1998

ASSIGNMENTS/ANNOUNCEMENTS

Read Ch 7.
Quiz on Friday over Section 7.6 of Standish (Dynamic Memory Allocation)
Distinguished lecture: Ruzena Bajcsy, U. Penn, "Smart Cooperative Agents", 4:10 PM, Wed, Mar 25, room 124 Bright.
Another interesting lecture: Melba Crawford, UT Austin, "Digital Image Analysis: Applications to Earth Systems Science", 4:00 PM (not 4:10!) room 342 Zachry.

Chapter 7 presents more information about lists, their uses, and ways to implement them.

SPECIFYING THE LIST ADT

A list is a sequence of items with the operations

create - makes an empty list
empty - returns whether or not the list has no items in it
length - returns the number of items in the list
select(i) - returns i-th item in list
replace(i,x) - replace i-th item in list with item x
delete(x) - delete item x from the list (or a variant would be delete(i) - delete the i-th item)
insert(x) - insert item x into list (variants would be to indicate WHERE x is to be inserted, such as at the beginning, at the end, after a particular item, or before a particular item)

IMPLEMENTING THE LIST ADT

There are two obvious implementations of a list, either with an array, or with a linked list.

Array implementation: keep a counter indicating the next free index of the array. To insert at some location, shift the later items down. To delete at some location, shift the later items up.

Linked list implementation: keep a count of the number of nodes and a pointer to the first node in the list. To select or replace i-th item, need to traverse list to get there.

Let's review the pros and cons of these two implementations. Running time for various operations, on a sequence of n data items:

list operation	singly linked list	array
empty	O(1)	O(1)
length	O(1)	O(1)
select(i)	O(i)	O(1)
replace i-th item	O(i)	O(1)
delete(i)	O(i)	O(n-i)
insert(i)	O(i)	O(n-i)

The time for insert(i) in an array assumes no overflow occurs. If overflow occurs, then O(n) time is needed to copy the old array to the new, larger, one.

We talked before about the space requirements of these two representations. There is further discussion in the book.

If the array is an array of pointers to the items, then you have the space overhead of m pointers, where m is the size of the array allocated.
If the array is an array of the items themselves, then you have the space overhead of m-n (unused) items, where n is the current number of items in the list.
In both kinds of arrays, you also have the overhead of the counter (containing the next free index). <
If you use a linked list, then the space overhead is for n "link" pointers, and the header information.

We can quantify the tradeoffs between the array and linked list representations.

Let p be the number of bytes to store a pointer
Let q be the number of bytes to store an item
Let m be the size (number of indices) of the allocated array.

Then to hold n items,

the array representation uses q*m bytes (independent of n).
the linked list representation uses n*(p+q) bytes.

The tradeoff point is when q*m = n*(p+q), that is, when n = q*m/(p+q). Some observations:

When n is smaller than q*m/(p+q), the linked list is better.
When n is larger than q*m/(p+q), the array is better.
When the item size, q, is much larger than the pointer size, p, the linked list representation beats the array representation for smaller values of n.
When the item size, q, is closer to the pointer size, p, the linked list representation beats the array representation for larger values of n.

In this day and age, we are usually more concerned with running time than with space.

Some variations on linked lists that we mentioned before:

circular linked list -- last node has a pointer to the first node.
doubly linked list -- each node has a previous and next pointer (called two-way linked list in Standish)

GENERALIZED LISTS

A generalized list is a list of items, where each item might be a list itself.

Example: (a, b, (c, (d, e), f), g, (h, i)).

There are five elements in the (top level) list:

a
b
the list (c, (d, e), f)
g
the list (h, i)

Items which are not lists are called atoms (they cannot be further subdivided).

Here is Java code for a generalized list:

class Node {
    Object item;
    Node link;

// constructor would go here

}
class GenList {

    private Node first;

    // constructor would go here

    void insert(Object newItem) {
	Node node = new Node(newItem);	// call Node constructor
	first = node;
    }

    void print() {
	System.out.print('(');
	Node node = first;
	while (node != null) {
	    if (node.item instanceof GenList) {	// is item of type GenList?
		((GenList)node.item).print();	// cast to type GenList,
	    } 					// recursive call!
	    else {				
		System.out.print(node.item);	// every type has a
	    }					// toString method
	}
	System.out.print(')');
    }
}

Notice:

the use of the instanceof operator: "o instanceof C" returns true if
- object o is an instance of class C
- object o implements interface C
- object o is an instance of a subclass of C
- object o is an instance of a subclass of some class that implements interface C
casting node.item to type GenList, if appropriate
recursive call of the GenList method print
implicit use of the toString method of every class, in the call to System.out.print

(Note: don't confuse the print method of System.out with the print method we are defining for class GenList.)

So this print method is recursive. How do we know that it is well-defined, and we won't get an infinite loop? I.e., what is the stopping case, and how do we know we are getting closer to the stopping case?

The method has a while loop that steps through all the (top-level) items in the current list. If an item is not a generalized list, then it simply prints it. However, if an item is itself a generalized list, then the print method recursively calls itself on the current item. The stopping case is when you reach the end of the list. You get closer to it each time through the while.

Warning! If you have a "cycle" in your generalized list, you'll have a problem, and print will go into an infinite loop. For instance, suppose that you have a generalized list L with three items in it, and the second item is a generalized list which happens to be L! Diagram --

Another issue to be careful about is whether or not you have shared sublists. For instance, you could have the generalized list ( (x, y), b, (x, y) ), where the sublists (x, y) are actually the same object. Diagram -- If you change the first sublist, you will automatically change the second sublist in this case. You need to be very careful about whether you want to have shared sublists or not.

APPLICATION OF GENERALIZED LISTS: LISTP

Generalized lists are highly flexible and are good for applications where data structures grow and shrink in highly unpredictable ways during execution.

Generalized lists are the key structuring paradigm in the programming language LISP (LISt Processing language). LISP has been, and still is, very popular in the artificial intelligence community.

LISP is a functional language, which, loosely speaking, means that every statement is a function (in the mathematical sense, of taking some arguments and producing a result).

Each function call is represented as a list, with the name of the function coming first, and the arguments coming after it:

( FUNCTION ARG1 ARG2 ... )

Each argument could itself be the result of invoking some other function with its own list of arguments, etc.

We will not be talking about most of LISP (you will see it in the AI course and probably the programming languages course). However, let's see how we can take this idea and apply it to evaluating arithmetic expressions.

We now have prefix notation (as opposed to postfix), and we use parentheses to delimit the sublists:

( * (+ 3 4) (+ 8 6) )

is equal to (3 + 4) * (8 + 6). Using the parentheses is useful if we want to allow different numbers of arguments. For instance, why not allow plus to have more than 2 arguments?

( * (+ 3 4 5) (+ 8 6) )

STRINGS IN JAVA

Java differentiates between Strings and StringBuffers. A String object is immutable, whereas a StringBuffer object can be changed. They are both a kind of list.

Some useful methods of Strings:

String s1 = "hello";		// s1 refers to a String object "hello"

int len = s1.length();		// len contains 5

String s2 = new String(s1);	// s2 refers to a String object "hello";
				// 	s1 is unaffected;
				//	constructor can also take as a 
				//	a parameter a character array or 
				//	byte array

char c = s1.charAt(i);		// returns character at index i in the
				//	string to which s1 refers;
				//	start counting at 0

s1 = String.valueOf(3);		// s1 refers to the String object "3"
s1 = String.valueOf(4.5);	// s1 refers to the String object "4.5"
s1 = String.valueOf('&');	// s1 refers to the String object "&" 
				//	(not a char)
boolean b = false;
s1 = String.valueOf(b);		// s1 refers to the String object "false"

s1 = "prepare";
s2 = "par";
int m = s1.indexOf(s2);		// m contains 3, since the leftmost
				//	occurrence of "par" inside "prepare"
				//	begins at index 3 of "prepare"

s3 = s1 + s2;			// s3 refers to String object "preparepar";
				//	s1 and s2 are unaffected
s3 = s2.concat(s1);		// s3 refers to String object "parprepare";
				//	s1 and s2 are unaffected

There are no methods that change an existing String.

If you want to change the characters in a String, use a StringBuffer. The key features of a StringBuffer are that you can

change a character at a particular index in the string buffer
append a string at the end of a string buffer
insert a string somewhere in the middle of a string buffer

(There are a variety of ways to do these -- check the documentation, either on-line or in a reference book.)

The StringBuffer class can be implemented using an array of characters. The ideas are not complicated. You just have to create new arrays and do copying at appropriate times, so it is not particularly fast to do these operations. See Section 7.5 for some sample code.