CPSC614
Computer Architecture
Computer
Science Department
E.J. Kim Assignment 3, Due Mon, 03/28 Spring 2005
TA: Yuho Jin, Ping
Luo
Written Part
1. The following loop is a dot product (initially F2 is 0) and contains a recurrence. Assume the pipeline latencies in the following table.
Inst. producing
result Inst. using result Latency in cycles
FP ALU op Another FP op 3
FP ALU op Store double 2
Load double FP ALU op 1
Load double Store double 0
Branch 1
int ALU- branch 1
Foo: LD F0, 0(R1)
LD F4, 0(R2)
MULD F0, F0, F4
ADDD F2, F0, F2
DADDIU R1, R1, #-8
DADDIU R2, R2, #-8
BNEZ R1, foo
a. (10pts) Show a software pipelined version of this loop. You may omit the start-up and clean-up code.
b. (15pts) With a single-issue pipeline, unroll the loop a sufficient number of times to schedule it without any delays. Show the schedule after eliminating any redundant overhead instructions. You may use the back page.
c. (15pts) Show the schedule of the transformed code from b for a two-issue processor.
Problems from the
textbook
4.2
a. (15pts)
b. (15pts)
4.8
a. (15pts)
4.11 (15pts)
Project One
Objective
This project is to help you familiar with SimpleScalar3.0,
an execution driven simulator that implements a very detailed out-of-order
issue superscalar processor with a two-level memory system and speculative
execution support.
Through this project, you should be able to understand simplescalar's configuration file and change it according to your need. You should also be able to read the output file and analyze the results.
System Requirement
Linux operating system is needed in order to use the pre-compiled little-endian Alpha ISA SPEC2000 binaries.
Procedure
I. Download and install SimpleScalar 3.0
(1) Download simplesim-3v0d.tar from http://www.simplescalar.com/
(2)
Execute 'tar xvf simplesim-3v0d.tar'
(3) Read the README.txt file under the simplesim3.0 directory you have just untared and compile the simulator according to the instructions.
(4) After you get the simulator, execute 'sim-outorder', you will get all the configurable parameters in the out-of-order simulator and their default values. Lookup the default branch predictor used in the simulator.
II. Get the benchmark
Check the following link for available pre-compiled SPEC2000 alpha binaries:
http://www.eecs.umich.edu/~chriswea/benchmarks/spec2000.html
Each student must choose one of the benchmarks listed below to do the simulations required in III. To choose the benchmark, use your last four digits of your student ID and divide it by 12. The remainder is used as the index number to select the benchmark you should run.
Index Name
1. Crafty00
2. Eon00
3. Gcc00
4. Perlbmk00
5. Vertex00
6. Applu00
7. Apsi00
8. Equake00
9. Fma3d00
10. Mgrid00
11. Sixtrack00
12. Swim00
III. Do the simulation with sim-outorder
1. Execute 'sim-outorder -redir:sim sim1.out –max:inst 500000000 –fastfwd 200000000 filename' (replace filename with a SPEC2000 benchmark name), and the result is stored in sim1.out. (Notice, the instruction size is 500 million and forward size is 200 million)
2. Change the branch predictor to a 2-level predictor and store the results in another file, named sim2.out.
3. Change the branch predictor to a combining predictor and store the results in sim3.out
4. Compare the three output files (sim1, sim2, and sim3.out). Which predictor is the best and why?
5. For the default branch predictor, change the return address stack (ras) size to 4 and 16 respectively, and keep other parameter unchanged. Get the two sets of outputs sim4.out and sim5.out. Compare them with sim1.out. Which one is the best and why?
6. Same as step 5, but this time change the branch target buffer (btb) instead of ras. The default btb size is (512, 4). Now change it to (256, 8) and (256, 4) respectively. Store the results into sim6.out and sim7.out. Compare the results (sim1, sim6, and sim7.out) and interpret them.
7. Write all your analysis in a short report called report.txt (or .doc, .pdf)
IV. Turnin Instruction
Put all your output files and the report into one sub-directory named proj1. Execute ‘tar cvf proj1.tar ./proj1’. Log on csnet.cs.tamu.edu to turn in your proj1.tar. For detailed instructions on how to turn in, please read http://helpdesk.cs.tamu.edu/docs/csnet_turnin
Here is a summary of the files in proj1.tar.
sim1.out (sim-outorder default output)
sim2.out (sim-outorder with a 2-level predictor)
sim3.out (sim-outorder with a combining predictor)
sim4.out (sim-outorder with ras size = 4)
sim5.out (sim-outorder with ras size = 16)
sim6.out (sim-outorder with btb size = (256, 8))
sim7.out (sim-outorder with btb size = (256, 4))
report.txt (your analysis)
The SimpleScalar Tool Set 2.0, Doug Burger and Todd Austin
Newsgroup
If you have any question about the project and simplescalar, you can post your question in the newsgroup tamu.classes.cpsc614. We will answer your question as soon as possible.
How to run the
benchmark
Some of the benchmarks need input and parameters to run correctly. In those cases, you actually need to plug in more information to replace filename. For example, in order to test on mgrid00, you need to do the following:
sim-outorder -redir:sim sim1.out –max:inst 500000000 –fastfwd
200000000 mgrid00 < mgrid.in
Here are some of the replacements for the benchmarks mentioned in procedure II. If it is not listed here, then just use the benchmark name.
mcf00 inp.in
eon00
chair.control.rushmeier chair.camera chair.surfaces chair.rushmeier.ppm ppm
pixels_out.rushmeier
gcc00 integrate.i -o integrate.s
perlbmk00 scrabbl.pl
scrabbl.in
vortex00 lendian1.raw
mgrid00 < mgrid.in
You can download the input set for each of the benchmarks from http://students.cs.tamu.edu/p0l3789.