CS 5513 Fall 2011, Homework 5

Due at the beginning of class, November 21, 2011.

Multiple Issue. Case Study Problems 2.3, 2.4, 2.5.
Register Renaming. Case Study Problem 2.9

Dynamic Scheduling

Case Study Problem 2.12 parts a through e.

Solve the following problem that appeared on a previous Computer Architecture qualifying exam. Students had 3 hours to do all problems including this one. You have 2 weeks.

Assume we have a MIPS machine that uses the speculative version of Tomasulo's algorithm. It can issue up to four instructions per clock cycle, but branches must always issue alone. You should make the following assumptions about the machine:

There is one branch unit (BU) that does branch target address calculation and conditional evaluation in one cycle, and it has one reservation station (BR1)
FP addition and subtraction execute in 3 cycles
FP multiplication executes in 5 cycles
FP division executes in 8 cycles, uses a floating point multiply unit, and is unpipelined
Integer operations and effective address addition execute in 1 cycle
Data can be accessed in the data cache in 1 cycle
The FPUs are fully pipelined (except for division!), and there are 2 FP adders (FA1,FA2), 1 FP multiplier (FM1)
There are two ALUs (A1,A2), which handle effective address calculation along with all other integer operations.
There are 8 entries in the reorder buffer
The Issue and Commit columns of the table can handle up to four instructions (each) per cycle
The CDB can handle up to two instructions per clock cycle
At most one instruction may be accessing the data cache in each clock cycle
The remaining FUs all have two reservation stations are, named: FMUL[1,2], FAD[1,2], ALU[1,2], ST[1,2], LD[1,2]

Copy and fill in the following table indicating the cycles in which each instruction performs each step. Note that this table has been augmented to allow you to track the funtional unit, reservation station and ROB entry being used, which should ease your job of finding hazards. Notice that the FU/EX entry indicates what functional unit is performing computation for the instruction, and what cycles it begins on and ends on. Thus we see that the first instruction uses the first FM1 to perform the floating point multiply, and that the functional unit becomes busy at the beginning of cycle 2 and finishes at the end of cycle 6.

**Tomasulo Worksheet**
Instruction	Issue	ROB/ResStat	FU/EX	MEM	CDB	Commit
`MUL.D F1,F2,F3`	1	#1/FMUL1	FM1/2-6	--	7	8
`L.D F2,8(R1)`
`S.D F2,32(R2)`
`LW R3,48(R1)`
`L.D F3,0(R1)`
`ADD.D F4,F3,F1`
`DIV.D F5,F2,F11`
`DADDIU R1,R1,#8`
`LW R5,0(R2)`
`BNE R1,R5,LAB1`
`SUB.D F6,F1,F2`
`MUL.D F7,F2,F6`
`S.D F7,0(R8)`

Hint: This table is a hint to get you started:

**Tomasulo Hint**
Instruction	Issue	ROB/ResStat	FU/EX	MEM	CDB	Commit
`MUL.D F1,F2,F3`	1	#1/FMUL1	FM1/2-6	--	7	8
`L.D F2,8(R1)`	1	#2/LD1	A1/2-2	3	4	8
`S.D F2,32(R2)`	1	#3/ST1	A2/2-2	--	--	8
`LW R3,48(R1)`	1	#4/LD2	A1/3-3	4	5	8
`L.D F3,0(R1)`	5	#5/LD1	A1/6-6	7	8	9
`ADD.D F4,F3,F1`
`DIV.D F5,F2,F11`
`DADDIU R1,R1,#8`
`LW R5,0(R2)`
`BNE R1,R5,LAB1`
`SUB.D F6,F1,F2`
`MUL.D F7,F2,F6`
`S.D F7,0(R8)`

About the hint:

The first four instructions can all issue in cycle 1; there are no structural hazards between them and all of the register values are available in the register file. The fifth instruction, L.D, waits because we are only issuing up to 4 instructions per cycle, and L.D cannot issue until a Load reservation station becomes available after cycle 4.
The 2nd through 5th instructions all use the A[1-2] functional units because they are all memory instructions that need to compute an effective address.
Why does the LW read memory in cycle 4 instead of 3? Two reasons: 1) the effective address computation occurs in the 3rd cycle because A[1-2] were in use before then by other instructions, and 2) even if we had ample adder function units, we still want loads to access memory in order to avoid data hazards through memory. (A more robust machine with memory disambiguation might be able to do both loads simultaneously.)
Why can't the second instruction, L.D, commit in cycle 5, one cycle after it has broadcast its result to the CDB? Because we commit in order. Remember, we issue in order, execute out of order, and commit in order.
Why does the S.D instruction use A2 and not A1? Because A1 is busy in cycle 2, so the issue logic chooses A2 instead.
Why does the L.D F3,0(R1) instruction use the A1 function unit and not the A2 functional unit? No reason. Both of them were available when the instruction was issued, so the issue logic could have chosen either one, and so can you. However, it's a good idea to choose the lowest numbered functional unit that is available since that is probably what your instructor is going to do and it makes it easier to grade :-)
Why isn't the MEM field filled out for the S.D instruction? On page 121 of your book, see a similar table. MEM is an abbreviation for "Read access at clock number." We don't really need to worry about writes to memory, just reads because we need to make sure they happen in order with respect to one another.

You may not work together on this assignment with other classmates or receive assistance from any person other than your professor. Use only your book and resources provided by your instructor as a resource for assistance. Do not use solutions you find on the Internet or from any other source.

Your assignment should be nicely typed or word-processed. Turn in this assignment at the beginning of class on November 21, 2011. You must turn in the assignment on paper at the beginning of class.

Late assignments will not be accepted.