Loop: LD R1,0(R2) ; load R1 from address 0+R2 DADDI R1,R1,#1 ; R1 = R1 + 1 SD 0(R2),R1 ; store R1 at address 0+R2 DADDI R2,R2,#4 ; R2 = R2 + 4 DSUB R4,R3,R2 ; R4 = R3 - R2 BNEZ R4,Loop ; branch to Loop if R4 != 0Assume that the initial value of R3 is R2 + 396.
Throughout this exercise use the classic RISC five-stage integer pipeline (see Figure A.1) and assume all memory accesses take one clock cycle. Do not assume a branch delay slot; the instruction after a branch should not be executed if the branch is taken.
Loop: L.D F0,0(R2) ; load F0 from address 0+R2 L.D F4,0(R3) ; load F4 from address 0+R3 MUL.D F0,F0,F4 ; F0 = F0 * F4 ADD.D F2,F0,F2 ; F2 = F0 + F2 DADDUI R2,R2,#8 ; R2 = R2 + 8 DADDUI R3,R3,#8 ; R3 = R3 + 8 DSUBU R5,R4,R2 ; R5 = R4 - R2 BNEZ R5,Loop ; branch to Loop if R5 != 0Assume that the initial value of R4 is R2 + 792.
For this exercise assume the standard five-stage integer pipeline and the MIPS FP pipeline as described in Section A.5 of your book. If structural hazards are due to write-back contention, assume the earliest instruction gets priority and other instructions are stalled. See Figure A.30 for functional unit latencies.
|
|
|
Conditional Branches | | |
Jumps and calls | | |
Conditional Branches | |
We are examining a four-stage pipeline where the branch is resolved at the end of the second cycle for unconditional branches and at the end of the third cycle for conditional branches. Assuming that only the first pipeline stage can always be done independently of whether the branch goes and ignoring other pipeline stalls, how much faster would the machine be without any branch hazards?
Late assignments will not be accepted.