## CS 3853 Fall 2009 Second Exam

Write your answers on this question paper. Your answers must be clear and legible. You may not use your book, your notes, or any other such material during this exam. You may use a calculator. Use the space provided to show how you arrived at your answer for partial credit. Only the answers you write in the spaces provided will be graded; you may use the back of the pages for your own private calculations. If a question seems ambiguous, state any assumptions you need to make to work the problem.

- I. **True/False.** For each of the following answer 'T' for True or 'F' for False.
  - 1. \_\_\_\_A Single-Instruction-Multiple-Data (SIMD) model of computation is characterized by independent threads computing on private memories.
  - 2. <u>A Distributed Memory Multiprocessor is also called a Symmetric</u> Multiprocessor (SMP).
  - 3. \_\_\_\_ Suppose we expect at least an 80X parallel speedup from 100 processors. At most 0.25% of the execution of the original program may be sequential (i.e., non-parallelizable).
  - 4. \_\_\_\_\_ Suppose a conditional branch alternates between taken and not taken every time it is executed. A one-bit BHT predictor is likely to have a misprediction rate of 100% on this branch.
  - 5. \_\_\_\_ Suppose a conditional branch is taken the first 1000 times it is executed, and not taken the second 1000 times. A one-bit BHT predictor is likely to have performance similar to a correlating branch predictor on this branch.
  - 6. The reorder buffer enables precise exceptions and interrupts.
  - 7. In Tomasulo's algorithm with speculation, when a branch is mispredicted, the data written to memory and the register file as a result of that branch are rolled back to their previous values.
  - 8. \_\_\_\_ In Tomasulo's algorithm without speculation, instructions may complete out of order.
  - 9. \_\_\_\_ In Tomasulo's algorithm with speculation, instructions may complete out of order.
  - 10. \_\_\_\_ A return address predictor is likely to mispredict when there is deep recursion.

- II. Consider two machines, machine A and machine B. For both machines, all instructions except for mispredicted branches take one cycle. Mispredicted branches take one cycle plus an additional misprediction latency. Machine A has a clock rate of 1.0 GHz and a misprediction penalty of 5 cycles. Machine B has a clock rate of 2.0 GHz and a misprediction penalty of 20 cycles. Branches are 25% of all instructions.
  - 1. With static branch prediction, 80% of branches are predicted correctly.

Which machine is faster, machine A or B?

What is the speedup?

 With dynamic branch prediction, 95% of branches are predicted correctly. Which machine is faster, machine A or B?

What is the speedup?

III. 18. Assume the classic 5-stage pipeline, with registers being written in the first half of WB and read in the last half of ID. Consider the following code:

| L    | R9, | 0(R5) |    |  |
|------|-----|-------|----|--|
| DADD | R1, | R7,   | R8 |  |
| DSUB | R2, | R1,   | R9 |  |
| DADD | R3, | R1,   | R2 |  |

1. How many cycles does the code take, from first fetch to last write-back, assuming no forwarding?

2. How many cycles would the code take, from first fetch to last write-back, if we allow forwarding?

IV. Consider a direct-mapped data cache with a 32 blocks of 64 bytes each. Consider the following C code to sum together all of the elements of a two-dimensional array of 64-bit double-precision floating point numbers.

Assume all blocks in the cache are initially invalid. How many cache misses will result from the C code?

- V. Assume we have a MIPS machine that uses the speculative version of Tomasulo's algorithm. It can issue up to two instructions per clock cycle. Make the following assumptions about the machine:
  - 1. FP addition executes in 3 cycles
  - 2. FP multiplication executes in 5 cycles
  - 3. Integer operations and effective address addition execute in 1 cycle
  - 4. Data can be accessed in the data cache in 1 cycle
  - 5. The FPUs are fully pipelined and there are 2 FP adders (FA1,FA2), 1 FP multiplier (FM1)
  - 6. There are two ALUs (A1,A2), which handle effective address calculation along with all other integer operations.
  - 7. There are 8 entries in the reorder buffer
  - 8. The Issue and Commit columns of the table can handle up to two instructions (each) per cycle
  - 9. The CDB can handle up to two instructions per clock cycle.
  - 10. Operands written on the CDB in one clock cycle may be used in the next clock cycle.
  - 11. At most one instruction may be accessing the data cache in each clock cycle
  - 12. The remaining FUs all have two reservation stations are, named: FMUL[1,2], FAD[1,2], ALU[1,2], ST[1,2], LD[1,2]

Copy and fill in the following table indicating the cycles in which each instruction performs each step. Note that this table has been augmented to allow you to track the functional unit, reservation station and ROB entry being used, which should ease your job of finding hazards. Notice that the FU/EX entry indicates what functional unit is performing computation for the instruction, and what cycles it begins on and ends on. Three copies of the Tomasulo worksheet are provided; use the first two for practice if you like, but write your final answer in the third one (we will grade only the third worksheet).

| Instruction      | Issue | ROB/ResStat | FU/EX | MEM | CDB | Commit |
|------------------|-------|-------------|-------|-----|-----|--------|
| L.D F1, 0(R1)    |       |             |       |     |     |        |
| L.D F2, 0(R2)    |       |             |       |     |     |        |
| ADD.D F2, F1, F2 |       |             |       |     |     |        |
| MUL.D F4, F1, F2 |       |             |       |     |     |        |
| ADD.D F5, F2, F3 |       |             |       |     |     |        |
| S.D F5, 0(R3)    |       |             |       |     |     |        |

Tomasulo Worksheet #1 (use this for practice)

| Instruction      | Issue | ROB/ResStat | FU/EX | MEM | CDB | Commit |
|------------------|-------|-------------|-------|-----|-----|--------|
| L.D F1, 0(R1)    |       |             |       |     |     |        |
| L.D F2, 0(R2)    |       |             |       |     |     |        |
| ADD.D F2, F1, F2 |       |             |       |     |     |        |
| MUL.D F4, F1, F2 |       |             |       |     |     |        |
| ADD.D F5, F2, F3 |       |             |       |     |     |        |
| S.D F5, 0(R3)    |       |             |       |     |     |        |

Tomasulo Worksheet #2 (use this for practice)

## Tomasulo Worksheet #3 (write your final answer here)

| Instruction      | Issue | ROB/ResStat | FU/EX | MEM | CDB | Commit |
|------------------|-------|-------------|-------|-----|-----|--------|
| L.D F1, 0(R1)    |       |             |       |     |     |        |
| L.D F2, 0(R2)    |       |             |       |     |     |        |
| ADD.D F2, F1, F2 |       |             |       |     |     |        |
| MUL.D F4, F1, F2 |       |             |       |     |     |        |
| ADD.D F5, F2, F3 |       |             |       |     |     |        |
| S.D F5, 0(R3)    |       |             |       |     |     |        |