CPSC614 Computer Architecture

 

Texas A&M University

Computer Science Department

 

E.J. Kim                      Assignment 3, Due Mon, 03/28                Spring 2005

TA: Yuho Jin, Ping Luo                    

 

 

Written Part

 

1. The following loop is a dot product (initially F2 is 0) and contains a recurrence. Assume the pipeline latencies in the following table.

Inst. producing result Inst. using result        Latency in cycles

FP ALU op                              Another FP op             3

FP ALU op                              Store double                 2

Load double                             FP ALU op                  1

Load double                             Store double                 0

Branch                                                                         1

int ALU- branch                                                           1

 

            Foo:     LD                   F0, 0(R1)

                        LD                   F4, 0(R2)

                        MULD             F0, F0, F4

                        ADDD             F2, F0, F2

                        DADDIU         R1, R1, #-8

                        DADDIU         R2, R2, #-8

                        BNEZ              R1, foo

a. (10pts) Show a software pipelined version of this loop. You may omit the start-up and clean-up code.

b. (15pts) With a single-issue pipeline, unroll the loop a sufficient number of times to schedule it without any delays. Show the schedule after eliminating any redundant overhead instructions. You may use the back page.

c. (15pts) Show the schedule of the transformed code from b for a two-issue processor.

 

 

 

 

 

 

 

Problems from the textbook

 

4.2

a. (15pts)

b. (15pts)

4.8

a. (15pts)

4.11 (15pts)

 

 

Project One

 

Objective

 

This project is to help you familiar with SimpleScalar3.0, an execution driven simulator that implements a very detailed out-of-order issue superscalar processor with a two-level memory system and speculative execution support.

Through this project, you should be able to understand simplescalar's configuration file and change it according to your need. You should also be able to read the output file and analyze the results.

 

System Requirement

 

Linux operating system is needed in order to use the pre-compiled little-endian Alpha ISA SPEC2000 binaries.

 

Procedure

 

I. Download and install SimpleScalar 3.0

(1) Download simplesim-3v0d.tar from http://www.simplescalar.com/

(2) Execute 'tar xvf simplesim-3v0d.tar'

(3) Read the README.txt file under the simplesim3.0 directory you have just untared and compile the simulator according to the instructions.

(4) After you get the simulator, execute 'sim-outorder', you will get all the configurable parameters in the out-of-order simulator and their default values. Lookup the default branch predictor used in the simulator.

 

II. Get the benchmark

Check the following link for available pre-compiled SPEC2000 alpha binaries:

http://www.eecs.umich.edu/~chriswea/benchmarks/spec2000.html

Each student must choose one of the benchmarks listed below to do the simulations required in III. To choose the benchmark, use your last four digits of your student ID and divide it by 12. The remainder is used as the index number to select the benchmark you should run.

 

Index    Name

1.         Crafty00

2.         Eon00

3.         Gcc00

4.         Perlbmk00

5.         Vertex00

6.         Applu00

7.         Apsi00

8.         Equake00

9.         Fma3d00

10.       Mgrid00

11.       Sixtrack00

12.       Swim00

 

III. Do the simulation with sim-outorder

1. Execute 'sim-outorder -redir:sim sim1.out –max:inst 500000000 –fastfwd 200000000  filename' (replace filename with a SPEC2000 benchmark name), and the result is stored in sim1.out. (Notice, the instruction size is 500 million and forward size is 200 million)

2. Change the branch predictor to a 2-level predictor and store the results in another file, named sim2.out.

3. Change the branch predictor to a combining predictor and store the results in sim3.out

4. Compare the three output files (sim1, sim2, and sim3.out). Which predictor is the best and why?

5. For the default branch predictor, change the return address stack (ras) size to 4 and 16 respectively, and keep other parameter unchanged. Get the two sets of outputs sim4.out and sim5.out. Compare them with sim1.out. Which one is the best and why?

6. Same as step 5, but this time change the branch target buffer (btb) instead of ras. The default btb size is (512, 4). Now change it to (256, 8) and (256, 4) respectively. Store the results into sim6.out and sim7.out. Compare the results (sim1, sim6, and sim7.out) and interpret them.

7. Write all your analysis in a short report called report.txt (or .doc, .pdf)

 

IV. Turnin Instruction

Put all your output files and the report into one sub-directory named proj1. Execute ‘tar cvf proj1.tar ./proj1’. Log on csnet.cs.tamu.edu to turn in your proj1.tar. For detailed instructions on how to turn in, please read http://helpdesk.cs.tamu.edu/docs/csnet_turnin

Here is a summary of the files in proj1.tar.

sim1.out (sim-outorder default output)

sim2.out (sim-outorder with a 2-level predictor)

sim3.out (sim-outorder with a combining predictor)

sim4.out (sim-outorder with ras size = 4)

sim5.out (sim-outorder with ras size = 16)

sim6.out (sim-outorder with btb size = (256, 8))

sim7.out (sim-outorder with btb size = (256, 4))

report.txt (your analysis)

Reading

 

The SimpleScalar Tool Set 2.0, Doug Burger and Todd Austin

 

Newsgroup

 

If you have any question about the project and simplescalar, you can post your question in the newsgroup tamu.classes.cpsc614. We will answer your question as soon as possible.

 

How to run the benchmark

 

Some of the benchmarks need input and parameters to run correctly. In those cases, you actually need to plug in more information to replace filename. For example, in order to test on mgrid00, you need to do the following:

 

sim-outorder -redir:sim sim1.out –max:inst 500000000 –fastfwd 200000000  mgrid00  < mgrid.in

 

Here are some of the replacements for the benchmarks mentioned in procedure II. If it is not listed here, then just use the benchmark name.

 

mcf00 inp.in

 

eon00 chair.control.rushmeier chair.camera chair.surfaces chair.rushmeier.ppm ppm pixels_out.rushmeier

 

gcc00 integrate.i -o integrate.s

 

perlbmk00 scrabbl.pl scrabbl.in

 

vortex00 lendian1.raw

 

mgrid00  < mgrid.in

 

You can download the input set for each of the benchmarks from http://students.cs.tamu.edu/p0l3789.

mcf00

eon00

gcc00

perlbmk00

vortex00

mgrid00