CS 5513 Fall 2011, Homework 5
Due at the beginning of class, November 21, 2011.
- Multiple Issue. Case Study Problems 2.3, 2.4, 2.5.
- Register Renaming. Case Study Problem 2.9
- Dynamic Scheduling
- Case Study Problem 2.12 parts a through e.
- Solve the following problem that appeared on a previous Computer
Architecture qualifying exam. Students had 3 hours to do all problems
including this one. You have 2 weeks.
Assume we have a MIPS machine that uses the speculative version of
Tomasulo's algorithm. It can issue up to four instructions per clock
cycle, but branches must always issue alone.
You should make the following assumptions about the machine:
- There is one branch unit (BU) that does branch target address calculation
and conditional evaluation in one cycle, and it has one
reservation station (BR1)
- FP addition and subtraction execute in 3 cycles
- FP multiplication executes in 5 cycles
- FP division executes in 8 cycles, uses a floating point multiply unit,
and is unpipelined
- Integer operations and effective address addition execute in 1 cycle
- Data can be accessed in the data cache in 1 cycle
- The FPUs are fully pipelined (except for division!), and there are
2 FP adders (FA1,FA2),
1 FP multiplier (FM1)
- There are two ALUs (A1,A2), which handle effective address calculation
along with all other integer operations.
- There are 8 entries in the reorder buffer
- The Issue and Commit columns of the table can handle up to four
instructions (each) per cycle
- The CDB can handle up to two instructions per clock cycle
- At most one instruction may be accessing the data cache in each clock
cycle
- The remaining FUs all have two reservation stations are, named:
FMUL[1,2], FAD[1,2], ALU[1,2], ST[1,2], LD[1,2]
Copy and fill in the following table indicating the cycles in which
each instruction performs each step. Note that this table has been
augmented to allow you to track the funtional unit, reservation station
and ROB entry being used, which should ease your job of finding hazards.
Notice that the FU/EX entry indicates what functional unit is performing
computation for the instruction, and what cycles it begins on and ends on.
Thus we see that the first instruction uses the first FM1 to perform the
floating point multiply, and that the functional unit becomes busy at the
beginning of cycle 2 and finishes at the end of cycle 6.
Tomasulo Worksheet
Instruction
|
Issue
|
ROB/ResStat
|
FU/EX
|
MEM
|
CDB
|
Commit
|
MUL.D F1,F2,F3 | 1 | #1/FMUL1 | FM1/2-6 | -- | 7 | 8 |
L.D F2,8(R1) | | | | | | |
S.D F2,32(R2) | | | | | | |
LW R3,48(R1) | | | | | | |
L.D F3,0(R1) | | | | | | |
ADD.D F4,F3,F1 | | | | | | |
DIV.D F5,F2,F11 | | | | | | |
DADDIU R1,R1,#8 | | | | | | |
LW R5,0(R2) | | | | | | |
BNE R1,R5,LAB1 | | | | | | |
SUB.D F6,F1,F2 | | | | | | |
MUL.D F7,F2,F6 | | | | | | |
S.D F7,0(R8) | | | | | | |
Hint: This table is a hint to get you started:
Tomasulo Hint
Instruction
|
Issue
|
ROB/ResStat
|
FU/EX
|
MEM
|
CDB
|
Commit
|
MUL.D F1,F2,F3 | 1 | #1/FMUL1 | FM1/2-6 | -- | 7 | 8 |
L.D F2,8(R1) | 1 | #2/LD1 | A1/2-2 | 3 | 4 | 8 |
S.D F2,32(R2) | 1 | #3/ST1 | A2/2-2 | -- | -- | 8 |
LW R3,48(R1) | 1 | #4/LD2 | A1/3-3 | 4 | 5 | 8 |
L.D F3,0(R1) | 5 | #5/LD1 | A1/6-6 | 7 | 8 | 9 | |
ADD.D F4,F3,F1 | | | | | | |
DIV.D F5,F2,F11 | | | | | | |
DADDIU R1,R1,#8 | | | | | | |
LW R5,0(R2) | | | | | | |
BNE R1,R5,LAB1 | | | | | | |
SUB.D F6,F1,F2 | | | | | | |
MUL.D F7,F2,F6 | | | | | | |
S.D F7,0(R8) | | | | | | |
About the hint:
- The first four instructions can all issue in cycle 1; there are
no structural hazards between them and all of the register values are
available in the register file. The fifth instruction, L.D, waits because
we are only issuing up to 4 instructions per cycle, and L.D cannot issue
until a Load reservation station becomes available after cycle 4.
- The 2nd through 5th instructions all use the A[1-2] functional units
because they are all memory instructions that need to compute an effective
address.
- Why does the LW read memory in cycle 4 instead of 3? Two reasons: 1)
the effective address computation occurs in the 3rd cycle because A[1-2]
were in use before then by other instructions, and 2) even if we had ample
adder function units, we still want loads to access memory in order to
avoid data hazards through memory. (A more robust machine with memory
disambiguation might be able to do both loads simultaneously.)
- Why can't the second instruction, L.D, commit in cycle 5, one cycle
after it has broadcast its result to the CDB? Because we commit in order.
Remember, we issue in order, execute out of order, and commit in order.
- Why does the S.D instruction use A2 and not A1? Because A1 is busy
in cycle 2, so the issue logic chooses A2 instead.
- Why does the L.D F3,0(R1) instruction use the A1 function unit and
not the A2 functional unit? No reason. Both of them were available when
the instruction was issued, so the issue logic could have chosen either
one, and so can you. However, it's a good idea to choose the lowest
numbered functional unit that is available since that is probably what
your instructor is going to do and it makes it easier to grade :-)
- Why isn't the MEM field filled out for the S.D instruction? On page
121 of your book, see a similar table. MEM is an abbreviation for "Read
access at clock number." We don't really need to worry about writes to
memory, just reads because we need to make sure they happen in order with
respect to one another.
You may not work together on this assignment with other classmates or receive
assistance from any person other than your professor. Use only your book
and resources provided by your instructor as a resource for assistance.
Do not use solutions you find on the Internet or from any other source.
Your assignment should be nicely typed or word-processed. Turn in this
assignment at the beginning of class on November 21, 2011. You must turn
in the assignment on paper at the beginning of class.
Late assignments will not be accepted.