CSCE 312 Fall 2021
Computer Organization
Homework 6
Fast Processor Simulation

Read these instructions carefully. There is a considerable amount of information for you to read before the part about what you're supposed to do. For this assignment you will optimize code that simulates a simple CPU with memory and output. Our simulated CPU fetches instructions from the memory (which may be thought of as part of the CPU) and executes them. A special instruction, HALT, tells the CPU to stop executing altogether.

Our simulated CPU stores its instructions in an array of C++ structs. It can hold up to 2000 instructions. It represents memory as another array of 10000 ints called memory[]. Our CPU has special integer variable in it called the Program Counter. This is the index into the instruction array where the currently executing instruction is found. Our CPU simulator uses the following algorithm to execute programs:

  1. Read the file named on the command line into the array of instructions.
  2. Set the Program Counter (PC) equal to 0.
  3. Continue with steps 4 and 5 until "halt" is encountered.
  4. Fetch the instruction at the PC element of the array, then add one to the PC.
  5. Execute the instruction, possibly changing the PC again. If the instruction is the "halt" instruction, break out of the fetch/execute loop and exit the program, otherwise continue at step 3.

Program Files

Three C++ files currently implement the simulator: Read and understand those program files. To compile the program, download those files into your Linux account and type the following command from the Unix shell:
g++ -O3 -o sim sim.cc simulate.cc
This will leave an executable called sim in your directory.

Instruction Format

The instructions are stored in the following format:
struct instruction {
        int     type,		/* type of instruction */
                addr1,		/* first address */
                addr2,		/* second address */
                data;		/* data for the instruction */
};

instruction program[2000];
So the program[] array holds the CPU's program. The first field, type, indicates the type of instruction to be executed. For instance, if type is 7, then the instruction is a "multiply" instruction. The second and third fields, addr1 and addr2, are "addresses." These are usually indices into the memory array, but in some special cases they will be indices into the instruction array. The last field, data, is used to hold miscellaneous information a particular instruction might need. Here is a table of all the instructions in our simple CPU. Each instruction is given by
CPU Instructions
type ValueMnemonicDescription
0 NOP No Operation; just does nothing
1 STORE_CONST Store constant: memory[addr1] = data
2 MOVE Move memory: memory[addr1] = memory[addr2]
3 MOVEID Move memory indirect to direct: memory[memory[addr1]] = memory[addr2]
4 MOVEDI Move memory direct to indirect: memory[addr1] = memory[memory[addr2]]
5 ADD_CONST Add constant: memory[addr1] += data
6 ADD Add: memory[addr1] += memory[addr2]
7 MUL Multiply: memory[addr1] *= memory[addr2]
8 DIV Divide: memory[addr1] /= memory[addr2]
9 MOD Modulus (i.e., remainder): memory[addr1] %= memory[addr2]
10 OUTPUT Convert memory[addr1] to char and print to standard output. putchar(memory[addr1]) will do this.
11 COMPARE Compare memory[addr1] to memory[addr2];
if memory[addr1] < memory[addr2] then memory[addr1] = -1;
if memory[addr1] > memory[addr2] then memory[addr1] = 1;
if memory[addr1] is equal to memory[addr2] then memory[addr1] = 0. Note that this destroys one of the items being compared, and that's okay.
12 JUMP "Jump" to another instruction; Program Counter = addr1
13 COND_JUMP Conditional Jump: if memory[addr1] is equal to data, then Program Counter = addr2; otherwise, continue as usual.
14 STOREPC Store Program Counter to memory: memory[addr1] = Program Counter
15 LOADPC Load Program Counter from memory: Program Counter = memory[addr1]
16 HALT Halt; stop execution of instructions.

Sample Programs for the Simulator

There are four programs provided for use with the simulator. After you have compiled the simulator, download the programs and simulate them using the sim program. The sim programs takes one command line argument: the name of the file containing the simulated program. The following programs are provided: Note: Your web browser may download these binary files incorrectly. Use the wget command to download these files into your Linux account, e.g.:
wget https://people.engr.tamu.edu/djimenez/classes/312/hw6/hello.bin
wget https://people.engr.tamu.edu/djimenez/classes/312/hw6/test.bin
wget https://people.engr.tamu.edu/djimenez/classes/312/hw6/primes.bin
wget https://people.engr.tamu.edu/djimenez/classes/312/hw6/bubble.bin

Compiler Implementation of Switch Statement

The switch statement in simulate.cc is the heart of the simulator. GCC uses a neat trick to implement this switch statement: instead of a sequence of cmpl and je instructions, as you might expect, GCC uses a jump table. The type field of the instruction being simulated is used as an index into a table of addresses of x86 code, each of which correspond to the case statement implementing a particular simulated instruction. When the switch statement is executed, the address of the code for the correct case is loaded from the table and then jumped to using an indirect jump instruction. This technique cuts down on the number of instructions executed, theoretically improving performance.

What You're Supposed To Do

As clever as the jump table idea sounds, it can be inefficient on modern architectures because indirect jumps incur unavoidable pipeline stalls, slowing execution. (The problem is that instructions are fetched into and executed out of a queue. When an indirect jump is encountered, the processor temporarily doesn't know where to get instructions for the queue, so once it drains there are temporarily no instructions left to execute and performance suffers.) Your job is to optimize the simulate() function so that it is more efficient. Try the following two optimizations:

First Optimiztion: Linear Search Instead of Jump Table

Create a new C++ file called simulate-1.cc in which you will write your new simulate() function. Replace the switch and case statements with a sequence of if statements. That is, instead of the following:
	switch (type) {
	case 0: a;
	case 1: b;
	case 2: c;
	...
	}
do the following:
	if (type == 0) a;
	else if (type == 1) b;
	else if (type == 2) c;
	...
Compile the new function with the following command:
g++ -O3 -o sim-1 sim.cc simulate-1.cc
Time the new program versus the old program on the primes.bin program, redirecting output to /dev/null so that the time to do the output doesn't obscure the timing process:
time ./sim primes.bin > /dev/null
time ./sim-1 primes.bin > /dev/null
Which execution took longer? Was the second one faster? By how much?

Second Optimization: Ordered Linear Search

The first optimization imposes an ordering on the execution of the if statements. First, we check to see if the type is 0. Then we see if it is 1. Then we see if it is 2. And so forth. By the time we get to the last if statement, we have already checked for 16 values. However, not all instructions appear with the same frequency. It makes more sense to check first for the most common instruction, then the next most common, and so forth.

Make a new program file called simulate-2.cc that will contain another new version of the simulate() function. Temporarily modify the simulator program to count, for each instruction type, how often that type of instruction is executed, and print the instruction counts at the end of simulation. Run the modified simulator on the primes.bin program to collect the frequencies of the instruction types. Then reorder the if statements in the simulate() function to check the type fields in decreasing order of frequency, so that the most common instruction is checked first, etc.

Compile the new program with the following command:

g++ -O3 -o sim-2 sim.cc simulate-2.cc
Again, time the new program versus the old program on the primes.bin program:
time ./sim primes.bin > /dev/null
time ./sim-2 primes.bin > /dev/null
Which execution took longer? Was the second one faster? By how much? Also run your optimized simulator on the bubble.bin program to measure the speedup.

Note that you are not required to change the assembly language, but that your choice of C++ statements has a large impact on the generated assembly language. Also, this assignment sort-of shows you how a microprocessor works, so you get a little extra computer architecture learning at no extra charge.

What To Turn In

Turn in the following items:
  1. Your version of simulate-1.cc.
  2. Your version of simulate-2.cc.
  3. A short writeup explaining your results. Address the following questions:
Your TA will have instructions on how to turn in this assignment on Canvas.

Due Date

This assignment is due at 11:59pm on December 7, 2021.