Solution: 1 _| |-o||_ ___in___| |__out___ | _| |--||_ | 0 The top transistor is a PMOS transistor, and the bottom one is an NMOS transistor. A one flows from the source of the PMOS to the drain and thus the output if the input to the PMOS is zero. A zero flows from the source of the NMOS to the drain and thus the output if the input is a one. Note that only one transistor can be triggered for a given input.
Only one enhancement is usable at a time. Answer the following questions:
Note that this is the same as the first two parts of problem 1.16 in your textbook, which was one of the problems assigned for study. The solution can be found on page B-5.
#includeThe program prints the sum of the squares of the numbers from 1 through 20. When compiled with the GCC compiler on a Pentium 4 and edited for clarity, it looks like this:int main (void) { int i, j, k; /* initialize k to 0 */ k = 0; /* let i take on values from 1 through 20 */ for (i=1; i<=20; i++) { /* let j equal the square of i */ j = i; j = j * j; /* accumulate the square of i in k */ k = k + j; } /* print the resulting sum and exit */ printf ("%d\n", k); exit (0); }
1 .LC0: 2 .string "%d\n" 3 main: 4 # i is %edx, j is %eax, and k is %ecx 5 xorl %ecx,%ecx # set k equal to zero by xor'ing it with itself 6 movl $1,%edx # move one into i 7 .L21: 8 movl %edx,%eax # move i into j 9 imull %eax,%eax # set j equal to j times j 10 addl %eax,%ecx # set k equal to k plus %eax 11 incl %edx # i equals i plus one 12 cmpl $20,%edx # compare i with twenty 13 jle .L21 # if less than or equal, go back to .L21 14 pushl %ecx # push k on the stack as an argument of printf 15 pushl $.LC0 # push the address of "%d\n" as an argument of printf 16 call printf # call printf with the arguments "%d\n" and value of k 17 pushl $0 # push a zero on the stack as an argument to exit 18 call exit # call exit, leaving the programAnswer the following questions about this assembly language program:
The instructions from line 7 through 13 are executed 20 times; all the other instructions are executed once. Thus, the number of instructions executed is 7 + 6(20) = 127.
Note that pushl instructions are implicitly memory instructions, since they store a value to the stack. The loop consists of one imull and four single-cycle instructions, taking 8 cycles for one iteration, or 8(20) = 160 cycles total. The other instructions add up to 14 cycles, for a total of 174 cycles.
We know from the previous part that the program will take 174 cycles times the clock period of one nanosecond, or 174 nanoseconds on microarchitecture A. The clock period for microarchitecture B is (1 / 667,000,000) seconds, or approximately 1.5 nanoseconds. We know from the first part that there are 127 instructions, and if each one takes a single cycle, that is (1.5)127 = 190.5 nanoseconds. Thus, the program runs faster on microarchitecture A.
This question is open to interpretation. The pushl instructions that push constants onto the stack have no explicit dependences with other instructions, but they do implicity depend on each other because they read and write the stack pointer register. It is correct to say that every instruction has a dependence with some other instruction, so the answer to the problem is to list no line numbers. Other answers under other assumptions may also have received credit.
Little endian refers to the practice of storing multi-byte quantities in order from least to most significant bytes. It does relate to ISA because the ISA specifies whether operands should be little endian or big endian. The instruction format for an instruction set specifies how instructions are encoded, e.g., what and where opcodes, operands, addressing modes, etc. are stored in machine language instructions. It does relate to ISA because the ISA is specified through the instruction format. Dynamic branch prediction is the practice of consulting an on-line learning component to estimate the outcomes of conditional branches. It does not relate to ISA. Pipelining is a mechanism by which the execution of instructions is divided into stages that are executed in parallel, similar to an assembly line. It does not relate to ISA. Alignment refers to the placement of multi-byte quantities in memory. An architecture may enforce alignment of n-byte words on memory addresses divisible by n. It relates to ISA because instruction sets may enforce alignment restrictions on memory operands. An addressing mode is a way an instruction specifies how to generate an effective address from its operands. It does relate to ISA because addressing modes are part of how instructions work. PC-relative addressing is an addressing mode in which the effective address is computed by adding an offset to the current value of the program counter register. It does relate to ISA because PC-relative addressing has to be encoded in instructions.
Let us assume that the fetch stage is able to fetch the next instruction immediately when the branch outcome becomes available for reading from the register file. Four cycles must pass from the time the branch is fetched to the time the next fetch can occur (a fetch that occurred when the branch was in the decode stage would be invalidated since we don't know if it was the right instruction). So, branches cost 4 cycles to execute, where other instructions cost a single cycle. Since branches occur 10% of the time, the CPI of this machine is 0.1(4) + 0.9(1) = 1.3. Thus, the IPC is 1 / 1.3 = 0.77.
int A[100][5]; int main () { int i, j; for (i=0; i<100; i++) for (j=0; j<5; j++) A[i][j] = 0; }When it is compiled, the second for statement will result in a branch that will be taken four times, then not taken once, repeating this pattern 100 times. Call this branch "branch A." You needn't show your work for this problem.