CS 505: Computer Structures, Fall 2002
Midterm Examination

Answer each of the following questions. Write your answers in the bluebooks provided. You may use your own blank paper as scratch paper, but whatever you turn in should be clear and legible. You may not use your book, your notes, or any other such material during this exam. You may use a calculator.

Recall that there are two kinds of CMOS transistors:

PMOS transistors, triggered when a 0 is applied to the "gate" terminal, and able to pass 1's easily.
NMOS transistors, triggered when a 1 is applied to the "gate" terminal, and able to pass 0's easily.

Draw a NOT gate (i.e. an inverter circuit) using CMOS transistors. Clearly label the input and output of the circuit.

Solution:

              1
             _|
        |-o||_
___in___|     |__out___
        |    _|
        |--||_
              |
              0

The top transistor is a PMOS transistor, and the bottom one is an NMOS
transistor.  A one flows from the source of the PMOS to the drain and
thus the output if the input to the PMOS is zero.  A zero flows from the
source of the NMOS to the drain and thus the output if the input is a one.
Note that only one transistor can be triggered for a given input.

Three enhancements with the following speedups are proposed for a new architecture:
- Speedup₁ = 30
- Speedup₂ = 20
- Speedup₃ = 15
For example, during the time that enhancement 1 is in use, the program runs 30 times faster than it would normally.
Only one enhancement is usable at a time. Answer the following questions:
- If enhancements 1 and 2 are each usable for 25% of the time, what fraction of the time must enhancement 3 be used to achieve an overall speedup of 10? Show your work.
- Assume, for some benchmark, the possible fraction of use is 15% for each of enhancements 1 and 2 and 70% for enhancement 3. We want to maximize performance. If only one enhancement can be implemented, which should it be? Show your work.
```
Note that this is the same as the first two parts of problem 1.16 in your
textbook, which was one of the problems assigned for study.  The solution
can be found on page B-5.
```

Consider the following C program:

#include 

int main (void) {
	int	i, j, k;

	/* initialize k to 0 */

	k = 0;

	/* let i take on values from 1 through 20 */

	for (i=1; i<=20; i++) {

		/* let j equal the square of i */

		j = i;
		j = j * j;

		/* accumulate the square of i in k */

		k = k + j;
	}

	/* print the resulting sum and exit */

	printf ("%d\n", k);
	exit (0);
}

The program prints the sum of the squares of the numbers from 1 through 20. When compiled with the GCC compiler on a Pentium 4 and edited for clarity, it looks like this:

 1   .LC0:
 2   	.string	"%d\n"
 3   main:
 4   			# i is %edx, j is %eax, and k is %ecx
 5   	xorl %ecx,%ecx	# set k equal to zero by xor'ing it with itself
 6   	movl $1,%edx	# move one into i
 7   .L21:
 8    	movl %edx,%eax	# move i into j
 9   	imull %eax,%eax	# set j equal to j times j
10    	addl %eax,%ecx	# set k equal to k plus %eax
11    	incl %edx	# i equals i plus one
12    	cmpl $20,%edx	# compare i with twenty
13    	jle .L21	# if less than or equal, go back to .L21
14    	pushl %ecx	# push k on the stack as an argument of printf
15    	pushl $.LC0	# push the address of "%d\n" as an argument of printf
16    	call printf	# call printf with the arguments "%d\n" and value of k
17    	pushl $0	# push a zero on the stack as an argument to exit
18    	call exit	# call exit, leaving the program

Answer the following questions about this assembly language program:

How many instructions are executed? (Don't try to count any instructions executed by the implementations of printf or exit.)

The instructions from line 7 through 13 are executed 20 times; all the
other instructions are executed once.  Thus, the number of instructions
executed is 7 + 6(20) = 127.

Suppose imull takes four cycles, branches and calls take three cycles, memory instructions take two cycles, and all other instructions take one cycle. How many cycles will the program take?

Note that pushl instructions are implicitly memory instructions,
since they store a value to the stack.  The loop consists of one
imull and four single-cycle instructions, taking 8 cycles for
one iteration, or 8(20) = 160 cycles total.  The other instructions add
up to 14 cycles, for a total of 174 cycles.

Consider the microarchitecture of the processor for the previous part of this problem. Let's call that version of the microarchitecture "A." A runs at a clock rate of 1 GHz, i.e., the clock ticks one billions times each second. Suppose a new microarchtecture called "B" is proposed. In microarchitecture B, each instruction takes only one cycle. However, it runs at only 667 MHz, i.e., the clock ticks 667 million times each second. On which architecture will the program run faster? Show how you came up with your answer.
```
We know from the previous part that the program will take 174 cycles times
the clock period of one nanosecond, or 174 nanoseconds on microarchitecture
A.  The clock period for microarchitecture B is (1 / 667,000,000) seconds,
or approximately 1.5 nanoseconds.  We know from the first part that there are
127 instructions, and if each one takes a single cycle, that is (1.5)127 =
190.5 nanoseconds.  Thus, the program runs faster on microarchitecture A.
```

Gives the line number of every instruction that does not have a dependence with another different instruction.


This question is open to interpretation.  The pushl instructions
that push constants onto the stack have no explicit dependences with
other instructions, but they do implicity depend on each other because
they read and write the stack pointer register.  It is correct to say
that every instruction has a dependence with some other instruction,
so the answer to the problem is to list no line numbers.  Other answers
under other assumptions may also have received credit.

Consider each of the following terms:

Little endian
Instruction format
Dynamic branch prediction
Pipelining
Alignment
Addressing mode
PC-relative addressing

For each term, answer the following questions:

What is the definition of the term?
Does the term relate to instruction set architecture? If so, how?

Little endian refers to the practice of storing multi-byte quantities in
order from least to most significant bytes.  It does relate to ISA because
the ISA specifies whether operands should be little endian or big endian.

The instruction format for an instruction set specifies how instructions
are encoded, e.g., what and where opcodes, operands, addressing modes, etc.
are stored in machine language instructions.  It does relate to ISA because
the ISA is specified through the instruction format.

Dynamic branch prediction is the practice of consulting an on-line learning
component to estimate the outcomes of conditional branches.  It does not
relate to ISA.

Pipelining is a mechanism by which the execution of instructions is divided
into stages that are executed in parallel, similar to an assembly line.
It does not relate to ISA.

Alignment refers to the placement of multi-byte quantities in memory.
An architecture may enforce alignment of n-byte words on memory addresses
divisible by n.  It relates to ISA because instruction sets may enforce
alignment restrictions on memory operands.

An addressing mode is a way an instruction specifies how to generate
an effective address from its operands.  It does relate to ISA because
addressing modes are part of how instructions work.

PC-relative addressing is an addressing mode in which the effective address
is computed by adding an offset to the current value of the program counter
register.  It does relate to ISA because PC-relative addressing has to be
encoded in instructions.

Suppose conditional branches make up 10% of the instructions executed in a program. Consider the case of a five-stage pipeline that uses pipeline stall cycles to resolve branch hazards. That is, when a branch is decoded, the pipeline freezes up to the decode stage until the register upon which the branch depends is written to the register file. Assume writes to the register file occur in the first half of the WB stage, and reads occur in the second half. Ignore all pipeline hazards except for branch hazards caused by conditional branches.
- What is the instructions per cycle (IPC) of this machine?
```
Let us assume that the fetch stage is able to fetch the next instruction
immediately when the branch outcome becomes available for reading from
the register file.  
Four cycles must pass from the time the branch is fetched to the time the next
fetch can occur (a fetch that occurred when the branch was in the decode
stage would be invalidated since we don't know if it was the right
instruction).  So, branches cost 4 cycles to execute, where other instructions
cost a single cycle.  Since branches occur 10% of the time, the CPI of
this machine is 0.1(4) + 0.9(1) = 1.3.  Thus, the IPC is 1 / 1.3 = 0.77.
```
- Suppose we add bypass to the pipeline so that branches in the EX stage immediately forward the effective addresses of their targets to the IF stage. What is the IPC of this machine?
- Suppose branch prediction is added to the machine. The branch predictor happens to perfectly predict every branch in the program, but the pipeline depth must be increased to six. What is the IPC of this machine?
Consider the following C program:
```
int A[100][5];
int main () {
        int     i, j;
        for (i=0; i<100; i++) for (j=0; j<5; j++) A[i][j] = 0;
}
```
When it is compiled, the second for statement will result in a branch that will be taken four times, then not taken once, repeating this pattern 100 times. Call this branch "branch A." You needn't show your work for this problem.
- With a Smith branch predictor using a table of 1-bit entries, approximately what percentage of the time will branch A be predicted correctly?
- With a Smith branch predictor enhanced to maintain hysteresis bits for each entry, approximately what percentage of the time will branch A be predicted correctly?
- With a two-level adaptive branch predictor with a history length of 11, approximately what percentage of the time will branch A be predicted correctly?

CS 505: Computer Structures, Fall 2002 Midterm Examination

CS 505: Computer Structures, Fall 2002
Midterm Examination