Lecture 2: Instruction Set Architectures and Compilers

Instruction Set Architectures

An Instruction Set Architecture (ISA) is an agreement about how software will communicate with the processor. A common scenario in an ISA has the following features: A description of the ISA for a processor answers the following questions:

What are some things not specified by the ISA?

Example

Let's look at an example of an ISA. We'll begin by looking at a program written in assembly language, a language of mnemonic instructions. There's usually a 1 to 1 correspondence between assembly language instructions and machine instructions.
	ori	r1,r0,16	// OR immediate; r1 := r0 OR 16
	ori	r2,r0,1		// r2 := r0 OR 1
loop:	
	dadd	r3,r2,r2	// r3 := r2 + r2
	movz	r2,r3,r0	// r2 := r3 if r0 == 0
	dsubi	r1,r1,1		// r1 := r1 - 1
	bnez	r1,loop		// if r1 != 0 then goto loop
This program will be assembled into six 32-bit instructions in the MIPS ISA. What does the program do?

Types of Instruction Sets

There are three main types of instruction sets: These days, the first two are really only of historical interest. It turns out that the Java virtual machine is like a stack-based ISA, only at a higher level of representation. The x86 architecture is based on an ancient instruction set that had some aspects of accumulator architecture; it is classifed as a special-purpose register machine.

Of the general-purpose register (GPR) architectures, there are two main types:

So-called Reduced Instruction Set Computing (RISC) architectures are load/store architectures. Examples: SPARC, MIPS, Alpha AXP, PowerPC. Complex Instruction Computing (CISC) architectures are usually register-memory architectures. Examples: VAX, x86, MC68000.

Memory Addressing

Addressing memory is one of the most important functions of the ISA. How memory is addressed is specified by the ISA.

Most computer systems divide memory into 8-bit bytes. The ISA decides how to format these bytes into larger structures such as 32-bit integers. Some of the important aspects of this organization are:

Operand Types

An operand is a value that an instruction operates on. By giving an instruction type and an addressing mode, we have somehow specified some operands for the instruction. What kind of operands are there?

Types of Instructions

Instruction Encoding

How are instruction types, operands, addressing modes, etc. communicated to the hardware? The ISA specifies a binary encoding of instructions. The assembler encodes programs using this encoding, and the microarchitecture reads and executes the encoded program. The MIPS instruction set is a good example.

Example: The MIPS instruction set

Every instruction in the MIPS instruction set is 32-bit long. Let's number the bits from 0 to 31, with 31 being the most significant bit. There are 32 integer registers. Most of them are general purpose. Register R0 is always set to 0. The first six bits, bits 31-26, specify an opcode giving information about what that instruction is supposed to do. MIPS registers and addresses are 64-bit. MIPS is byte-addressable, requires aligned accesses, and can be switched to either Big Endian or Little Endian.

There are three general types of instructions:

Compilers

A compiler is a program that translates programs from a high-level language into machine instructions that can be directly executed by the CPU. The compiler has a large and, these days, increasing impact on the performance of computer systems.

Structure of an Optimizing Compiler

An optimizing compiler puts a great deal of effort into improving the quality of the generated machine code. The compiler has the following stages:
  1. Front end. The front end reads in the source code written by the programmer. It performs lexical analysis and parsing to get the code into an intermediate form that can be easily worked with in the rest of the compiler. The result of the front end is an intermediate representation of the program code, e.g. an abstract syntax tree or three-address code.
  2. High-level optimizations. This stage performs high-level code-improving transformations on the intermediate representation. The transformations at this stage are mostly machine-independent, i.e., they require little or no knowledge about the ISA. Some examples:
  3. Low-level optimizations. At this stage, the code is transformed into a lower-level intermediate representation that has more information about the ISA. Examples of possible optimizations:
  4. Code generation. At this stage, assembly-language code is generated from the intermediate representation. Some optimizations may be performed at this stage:
  5. Assembly. The assembler transforms the assembly language program into machine instructions ready to be loaded. Certain alignment optimizations may be performed at this time.
Note that in some compilers, some phases may be iterated over several times, and the same types of optimizations may be done at different levels of intermediate representations.

Effect of Compiler on the Architecture

The computer architect must be aware of compiler technology. Compilers decide what the mix of instructions executed will be. These days, architecture people and compiler people must cooperate to achieve improved performance.

Example Problems

Here are some examples of problems that might occur when there isn't enough communication between the hardware and software people. Note that those last two items might look inconsistent with one another. Compiler writers and architects should get together and decide what parts of the implementation should be exposed and what parts are best left hidden.

Example Compiler Output

Consider the following C program:
#include <stdio.h>
int a;

int main () {
	int	i;

	a = 0;
	for (i=0; i<100; i++) {
		a = a + i;
	}
	printf ("%d\n", a);
	exit (0);
}
With GCC version 2.95.4 on my Pentium 4, it compiles into something like this:
.LC0:
	.string	"%d\n"
	.comm	a,4,4

main:
	movl $0,a		# store a 0 to memory location a
	xorl %edx,%edx		# edx := edx xor edx, i.e., edx := 0
.L21:
	movl %edx,%eax		# eax := edx
	addl a,%eax		# eax := eax + a
	movl %eax,a		# a := eax
	incl %edx		# edx := edx + 1
	cmpl $99,%edx		# compare edx to 99
	jle .L21		# if less than or equal then goto .L21
	pushl %eax		# push eax onto the stack
	pushl $.LC0		# push the address of "%d\n" onto the stack
	call printf		# call the printf function
	pushl $0		# push a 0 onto the stack
	call exit		# call the exit function
Let's go through this program and see what's going on.