CS 3853 Spring 2010, Homework 3

Due at 11:59pm on Thursday, March 4, 2010.
  1. Use the following code fragment:
    Loop:   LD      R1,0(R2)        ; load R1 from address 0+R2
            DADDI   R1,R1,#1        ; R1 = R1 + 1
            SD      0(R2),R1        ; store R1 at address 0+R2
            DADDI   R2,R2,#4        ; R2 = R2 + 4
            DSUB    R4,R3,R2        ; R4 = R3 - R2
            BNEZ    R4,Loop         ; branch to Loop if R4 != 0
    
    Assume that the initial value of R3 is R2 + 396.

    Throughout this exercise use the classic RISC five-stage integer pipeline (see Figure A.1) and assume all memory accesses take one clock cycle. Do not assume a branch delay slot; the instruction after a branch should not be executed if the branch is taken.

    1. Show the timing of this instruction sequence for the RISC pipeline without any forwarding or bypassing hardware but assuming a register read and a write in the same clock cycle "forwards" through the register file, i.e., writes are done in the first half of the cycle and reads are done in the second half. Use a pipeline timing chart like Figures A.5 and A.10. Assume that the branch is handled by flushing the pipeline. If all memory references take one cycle, how many cycles does this loop take to execute?
    2. Show the timing of this instruction sequence for the RISC pipeline with normal forwarding and bypassing hardware. Use a pipeline timing chart as before. Assume that the branch is handled by predicting it as not taken (see page A-22). Assume that branches are resolved in the EX stage, i.e., on a mispredicted branch, fetch may resume on the correct path on the cycle after the branch's EX stage. Note that the decode stage does not necessarily need to stall if a register it is trying to read is not available yet since the value will be forwarded. If all memory references take one cycle, how many cycles does this loop take to execute?
  2. Use the following code fragment:
    Loop:	L.D	F0,0(R2)	; load F0 from address 0+R2
    	L.D	F4,0(R3)	; load F4 from address 0+R3
    	MUL.D	F0,F0,F4	; F0 = F0 * F4
    	ADD.D	F2,F0,F2	; F2 = F0 + F2
    	DADDUI	R2,R2,#8	; R2 = R2 + 8
    	DADDUI	R3,R3,#8	; R3 = R3 + 8
    	DSUBU	R5,R4,R2	; R5 = R4 - R2
    	BNEZ	R5,Loop		; branch to Loop if R5 != 0
    
    Assume that the initial value of R4 is R2 + 792.

    For this exercise assume the standard five-stage integer pipeline and the MIPS FP pipeline as described in Section A.5 of your book. If structural hazards are due to write-back contention, assume the earliest instruction gets priority and other instructions are stalled. See Figure A.30 for functional unit latencies.

    1. Using a timing diagram like the ones in Figures A.33 and A.34, show the timing of this instruction sequence for the MIPS FP pipeline without any forwarding or bypassing hardware but assuming a register read and write in the same clock cycle "forwards" through the register file. Assume that the branch is handled by flushing the pipeline. If all memory references hit in the cache, how many cycles does this loop take to execute?
    2. Show the timing of this instruction sequence for the MIPS FP pipeline with normal forwarding and bypassing hardware. Assume that the branch is handled by predicting it as not taken. If all memory references hit in the cache, how many cycles does this loop take to execute?
  3. Suppose the branch frequencies (as percentages of all instructions) are as follows:

    Instruction type
    Frequency
    Conditional Branches
    15%
    Jumps and calls
    1%
    Conditional Branches
    60% are taken

    We are examining a four-stage pipeline where the branch is resolved at the end of the second cycle for unconditional branches and at the end of the third cycle for conditional branches. Assuming that only the first pipeline stage can always be done independently of whether the branch goes and ignoring other pipeline stalls, how much faster would the machine be without any branch hazards?

Turn in your assignment as a PDF file emailed to our teaching assistant by 11:59pm on Thursday, March 4, 2010. You may not work together on this assignment. You may not receive assistance from anyone other than your professor or teaching assistant.

Late assignments will not be accepted.