CS 2733 Homework 4

CS 2733 Fall 2007
Computer Organization II
Homework 4

Multiple Precision Addition

Read these instructions carefully. For this assignment, you will modify an assembly language subroutine to perform multiple precision addition more efficiently. Your new subroutine will be used in a program that computes large Fibonacci numbers.

Multiple Precision Integer Arithmetic

Computer instruction sets such as x86 allow only a limited number of integers to be represented. The 32-bit "longword" format, equivalent to int in C, C++, and Java, can only represent 2³² integers. For unsigned integers, the values that can be represented are from 0 through 4,294,967,295. If we want to represent integers with, say, 320 bits, we would have to do something special. You may have dealt with some of this in your third assignment if your counters went over 2³².

We can represent multiple precision integers with arrays of normal integers. Each integer in the array would represent part of the multiple precision integer, just as each digit in a decimal number represents part of the number. But rather than doing arithmetic in base 10, as we do with decimal numbers, we do arithmetic in base 2³² for multiple precision integers.

Array Representation

Our implementation of multiple precision integers will be little-endian, i.e., the 0'th element of the array will contain the least significant component of the integer. For instance, the following array:

int A[] = { 0x12345678, 0xabcdef01, 0x87654321 };

represents the hexadecimal value 0x87654321abcdef0112345678 as an array of size 3. For this assignments, all arrays will be of size 10, but you can imagine extending this representation to arrays of varying sizes.

Coding Support for Multiple Precision Arithmetic

To do multiple precision arithmetic, we need to define several functions for doing things like addition, converting from normal integers to multiple precision integers, printing multiple precision integers, etc. Here is C code for doing multiple precision addition:

#define N	10

/* add multiple-precision integers in arrays A and B, placing result into
 * multiple-precision integer in array C.  each array is of size 10.
 * returns 1 if there is a carry (i.e. overflow), 0 otherwise
 */

int add (unsigned int A[], unsigned int B[], unsigned int C[]) {
        int     i, carry;
        unsigned int c;

        /* initially, carry is zero */

        carry = 0;

        /* go through each element from least to most significant */

        for (i=0; i<N; i++) {

                /* add the i'th elements of A and B with the carry */

                c = A[i] + B[i] + carry;

                /* there is a carry out of c, i.e. c overflowed, if
                 * either operand is greater than the sum
                 */

                carry = (A[i] > c) || (B[i] > c);

                /* place the partial sum into the i'th element of C */

                C[i] = c;
        }

        /* return carry (i.e. unsigned overflow) */

        return carry;
}

This code is a nice general algorithm in terms of the #defined constant N which we can change to support higher precision numbers. But if we are only interested in 320-bit numbers (i.e. 10 32 bit integers) then we can unroll the for loop, producing the following C code:

int add (unsigned int A[], unsigned int B[], unsigned int C[]) {
        int     carry;
        unsigned int c;

        carry = 0;
        c = A[0] + B[0] + carry;
        carry = (A[0] > c) || (B[0] > c);
        C[0] = c;
        c = A[1] + B[1] + carry;
        carry = (A[1] > c) || (B[1] > c);
        C[1] = c;
        c = A[2] + B[2] + carry;
        carry = (A[2] > c) || (B[2] > c);
        C[2] = c;
        c = A[3] + B[3] + carry;
        carry = (A[3] > c) || (B[3] > c);
        C[3] = c;
        c = A[4] + B[4] + carry;
        carry = (A[4] > c) || (B[4] > c);
        C[4] = c;
        c = A[5] + B[5] + carry;
        carry = (A[5] > c) || (B[5] > c);
        C[5] = c;
        c = A[6] + B[6] + carry;
        carry = (A[6] > c) || (B[6] > c);
        C[6] = c;
        c = A[7] + B[7] + carry;
        carry = (A[7] > c) || (B[7] > c);
        C[7] = c;
        c = A[8] + B[8] + carry;
        carry = (A[8] > c) || (B[8] > c);
        C[8] = c;
        c = A[9] + B[9] + carry;
        carry = (A[9] > c) || (B[9] > c);
        C[9] = c;
        return carry;
}

Here, the loop letting i go from 0 through 9 has been replaced by 10 identical copies of the loop body. This will enable the code to avoid the loop overhead and work faster, at the cost of making the function less general since it now only works for 320-bit numbers.

Click here to see the assembly code produced by gcc -O3 -S for this unrolled version of add. Unfortunately, this code is not as efficient as it could be, even though the compiler tried its best. In particular, the compiler does not realize that it can use the carry flag (CF) to represent the carry variable and the "add with carry longword" (adcl) instruction to add the elements of A and B.

Modifying the subroutine

Take the assembly code for the unrolled add function and modify it to satisfy the following:

Use adcl to add the elements of the array. By using adcl, you can use the carry flag instead of representing the carry using a separate variable. Note: The add function must still work correctly.
Put the value of the carry flag into the %eax register before the function returns.
Do not use any jumps or branches or calls at all in the code. That is, you should get rid of all the branches (e.g. ja, jbe, etc.) in the original assembly file and don't put in any new branches. (ret is technically a branch but you must leave that in so that the function can return.) Why this requirement? Because if you do the assignment the right way, there won't be any need for branches; so if you have branches, then you know you're not doing it right.
Call your new assembly file add-adcl.S.

Hints

Make sure to use addl instead of adcl just for the first addition (i.e. A[0] + B[0]) since there is no carry-in to this addition.
To put the value of the carry flag into %eax without using branches, you can do this: set %eax equal to zero, then add %eax to itself using the adcl instruction. It will add the carry flag to 0 + 0, so %eax will be 1 if and only if the carry flag is 1, or 0 otherwise. But be careful! Don't use xorl on %eax to set it equal to zero as we have before, because this instruction will modify the carry flag. Use movl $0, %eax instead.
You will have to understand (more or less) the assembly code for the unrolled add function, then rewrite a large part of it using adcl and some movl instructions to accomplish the same task.
Make sure you understand which instructions modify the carry flag and which don't. For example, movl and leal don't modify the carry flag, but xorl, addl, and adcl do.
Understanding adcl is important, but you also need to understand how to work with arrays of integers in assembly to get this done. You will need to use movl with addressing modes to load and store values to and from the memory that the three arrays occupy.

Computing Fibonacci Numbers

Click on fib.c to obtain a program that accepts a single integer command line parameter n, then prints the first n Fibonacci numbers in hexadecimal to standard output. (Google "fibonacci numbers" if you don't know what Fibonacci numbers are; it doesn't really matter, except that they can be computed using only unsigned integer addition and they get very large.) The program uses the add subroutine as the main code to compute Fibonacci numbers. When you are developing your subroutine, use the fib.c program to make sure your subroutine works.

Compile the program using both the old add subroutine and your new improved add subroutine:

gcc -O3 -o fib.old fib.c add-unrolled.s
gcc -O3 -o fib.new fib.c add-adcl.S

As a test, try listing the first 400 Fibonacci numbers by invoking each program:

./fib.old 400
./fib.new 400

Each program should produce exactly the same output. By the way, the 400th Fibonacci number is hexadecimal is this:

17322a2cfd320a23266116c4c2c95b3feea3e57fa3d9dfe8b8591e1d72120f26c6fadb

If your program doesn't produce that number when you use your version of the add function, then your add function is incorrect. (Note that you do not have to turn in anything related to Fibonacci numbers, but your assignment will be graded in terms of the fib.c program so make sure you use it to test your code.)

What to turn in

Email your version of add-adcl.S to the professor. Make sure to put comments in the assembly file including your name as well as comments that indicate how you modified the file. Do not email any other code or text, just the single assembly file.

Due Date

The due date for this program is October 18, 2007, at 11:59pm.

CS 2733 Fall 2007 Computer Organization II Homework 4 Multiple Precision Addition