Lecture 2: Instruction Set Architectures and Compilers

Instruction Set Architectures

An Instruction Set Architecture (ISA) is an agreement about how software will communicate with the processor. A common scenario in an ISA has the following features:

A flat 32-bit address space
A set of registers available to the programmer.
A program counter register through which instructions are fetched, initialized to some documented value.
A set of external objects that can generate interrupts.

A description of the ISA for a processor answers the following questions:

What instructions are available?
What addressing modes are available?
What is the format of data?
How many and what kind of registers are available?
What condition codes, if any, are defined?
How are exceptions handled?
How are interrupts handled?

What are some things not specified by the ISA?

How fast will a particular instruction go?
How is an instruction implemented?
What are procedure calling conventions?
What are cache replacement policies?
What happens on a page fault?

Example

Let's look at an example of an ISA. We'll begin by looking at a program written in assembly language, a language of mnemonic instructions. There's usually a 1 to 1 correspondence between assembly language instructions and machine instructions.

	ori	r1,r0,16	// OR immediate; r1 := r0 OR 16
	ori	r2,r0,1		// r2 := r0 OR 1
loop:	
	dadd	r3,r2,r2	// r3 := r2 + r2
	movz	r2,r3,r0	// r2 := r3 if r0 == 0
	dsubi	r1,r1,1		// r1 := r1 - 1
	bnez	r1,loop		// if r1 != 0 then goto loop

This program will be assembled into six 32-bit instructions in the MIPS ISA. What does the program do?

Types of Instruction Sets

There are three main types of instruction sets:

Stack
Accumulator
General-purpose register

These days, the first two are really only of historical interest. It turns out that the Java virtual machine is like a stack-based ISA, only at a higher level of representation. The x86 architecture is based on an ancient instruction set that had some aspects of accumulator architecture; it is classifed as a special-purpose register machine.

Of the general-purpose register (GPR) architectures, there are two main types:

Load/Store (or register-register). Only load and store instructions access main memory. The rest of the instructions act only on registers.
Register-memory. Any instruction may access main memory.

So-called Reduced Instruction Set Computing (RISC) architectures are load/store architectures. Examples: SPARC, MIPS, Alpha AXP, PowerPC. Complex Instruction Computing (CISC) architectures are usually register-memory architectures. Examples: VAX, x86, MC68000.

Memory Addressing

Addressing memory is one of the most important functions of the ISA. How memory is addressed is specified by the ISA.

Most computer systems divide memory into 8-bit bytes. The ISA decides how to format these bytes into larger structures such as 32-bit integers. Some of the important aspects of this organization are:

Endianness. Memory can be accessed as either Big Endian or Little Endian. That means a multi-byte structure such as an integer or address may be stored with either it's most significant byte first or last.
Alignment. Some ISAs, for implementation reasons, require that memory accesses be aligned. For instance, on the Alpha, an access to a 8-byte quadword must be through an address divisible by 8; otherwise, the offending load or store instruction will generate an exception.
Addressing modes. Addressing mode refers to the way in which a machine instruction accesses memory. The machine instruction contains some data that is used to come up with the effective address that will be used in some transaction with the memory system. Some examples:
- PC-relative addressing. An immediate offset in the instruction is added to the program counter register to yield the effective address.
- Displacement addressing. An immediate offset in the instruction is added to a specified register to yield the effective address.
- Immediate addressing. An immediate value is specified.
- Indirect addressing (in combination with other modes). A first effective address is used to fetch a value from memory. That value is used to form a second effective address.
There are many kinds of weird addressing modes (e.g. autoincrement), but modern ISAs usually use just a few regular types of addressing modes.

Operand Types

An operand is a value that an instruction operates on. By giving an instruction type and an addressing mode, we have somehow specified some operands for the instruction. What kind of operands are there?

Integers. Usually 8-bit (characters), 16-bit (words), 32-bit (doubleword), 64-bit (quadword). The terminology may differ from one ISA to another.
Single and double precision floating point numbers, usually 32-bit and 64-bit respectively.
Binary-coded decimal. A single decimal digit occupies one half of a byte. Sometimes called packed decimal because decimal digits are packed together into bytes.
Strings. Some ISAs support variable-length strings of bytes as a primitive data type in memory.
Vectors of primitive types. Some (weird) ISAs support fixed or variable length vectors of primitive types. Examples: CRAY vector processors, MMX extensions to x86.

Types of Instructions

Data transfer instructions. Load data from memory into registers, or store data from registers into memory. Transfer data between different kinds of special-purpose registers.
Arithmetic and logical instructions. Perform arithmetic (e.g. add, subtract, multiply) and logic (e.g. AND, OR, XOR) as well as a combination of both (less than, greater than, compare).
Control transfer instructions. Instructions that affect the value of the program counter register. Unconditional jump, procedure call, return, conditional branch, indirect jump, software interrupt (e.g. trap). Also, instructions that do something based on some condition, e.g. predicated instructions.
Floating point instructions. Traditionally, instructions that deal with floating point values are given separate treatment. Add, multiply, scientific calculator-type functions (e.g. tangent, square root), convert between integer, single, and double precision.

Instruction Encoding

How are instruction types, operands, addressing modes, etc. communicated to the hardware? The ISA specifies a binary encoding of instructions. The assembler encodes programs using this encoding, and the microarchitecture reads and executes the encoded program. The MIPS instruction set is a good example.

Example: The MIPS instruction set

Every instruction in the MIPS instruction set is 32-bit long. Let's number the bits from 0 to 31, with 31 being the most significant bit. There are 32 integer registers. Most of them are general purpose. Register R0 is always set to 0. The first six bits, bits 31-26, specify an opcode giving information about what that instruction is supposed to do. MIPS registers and addresses are 64-bit. MIPS is byte-addressable, requires aligned accesses, and can be switched to either Big Endian or Little Endian.

There are three general types of instructions:

I-type instructions. Instructions with immediate operands. rt := rs op immediate.

 ___________________________________________________________________________
|_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|________16-bit_immediate______________|

R-type instructions. Register-register arithmetic and logic instructions.

 ___________________________________________________________________________
|_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|_5-bit_rd_|_5-bit_shamt_|_6-bit_funct_|

J-type instructions. Jump to PC-relative address. Conditional jumps, jump-and-link, trap.

 ___________________________________________________________________________
|_6-bit opcode_|_____________26-bit offset added to PC______________________|

Compilers

A compiler is a program that translates programs from a high-level language into machine instructions that can be directly executed by the CPU. The compiler has a large and, these days, increasing impact on the performance of computer systems.

Structure of an Optimizing Compiler

An optimizing compiler puts a great deal of effort into improving the quality of the generated machine code. The compiler has the following stages:

Front end. The front end reads in the source code written by the programmer. It performs lexical analysis and parsing to get the code into an intermediate form that can be easily worked with in the rest of the compiler. The result of the front end is an intermediate representation of the program code, e.g. an abstract syntax tree or three-address code.
High-level optimizations. This stage performs high-level code-improving transformations on the intermediate representation. The transformations at this stage are mostly machine-independent, i.e., they require little or no knowledge about the ISA. Some examples:
- Constant propagation, constant folding
- Redundancy elimination
- Loop transformations
- Procedure integration (automatic inlining)
- Dead and unreachable code elimination
Low-level optimizations. At this stage, the code is transformed into a lower-level intermediate representation that has more information about the ISA. Examples of possible optimizations:
- Strength reduction
- Machine idioms
- Register allocation
- Cache-concious loop transformations
Code generation. At this stage, assembly-language code is generated from the intermediate representation. Some optimizations may be performed at this stage:
- Code placement
- Low-level feedback-directed optimization, e.g. branch hints
- Instruction selection (e.g. mov 0 vs. xor)
- Peephole optimizations
Assembly. The assembler transforms the assembly language program into machine instructions ready to be loaded. Certain alignment optimizations may be performed at this time.

Note that in some compilers, some phases may be iterated over several times, and the same types of optimizations may be done at different levels of intermediate representations.

Effect of Compiler on the Architecture

The computer architect must be aware of compiler technology. Compilers decide what the mix of instructions executed will be. These days, architecture people and compiler people must cooperate to achieve improved performance.

Example Problems

Here are some examples of problems that might occur when there isn't enough communication between the hardware and software people.

Useless instructions. If an architect designs a very cool instruction that is hardly ever used by the compiler, something has gone wrong. The VAX polynomial-evaluate and CALL instructions are examples. An instruction set must be, in a sense, easy to compile for.
Overly-ambitious or useless optimizations. The compiler writer might think that reducing the instruction count is the most important way to improve performance. However, the compiler writer must be aware that on modern microarchitectures, sometimes programs that execute more instructions can be faster. For example, multiplying by a constant can be done with a single multiply instruction or several shifts and adds. The latter may be faster, depending on microarchitectural details. On the Alpha, for example, inserting NOP (no operation) instructions in just the right places will cause a more even usage of functional units, speeding things up.
Not exposing enough of the microarchtecture. Modern microarchitectures with caches, speculation, out-of-order execution, etc. are increasingly complex. It is very difficult for a compiler writer to come up with a reasonable performance model without microarchitectural details. If more of these details are exposed, either through documentation or explicit changes in the ISA (e.g. performance counters, hint instructions), the compiler writer will be able to better improve the generated code.
Exposing too much of the microarchitecture. Encoding microarchitectural details into the ISA means that every subsequent implementation of the ISA must support these details or risk not being backward-compatible. A good example is branch delay slots.

Note that those last two items might look inconsistent with one another. Compiler writers and architects should get together and decide what parts of the implementation should be exposed and what parts are best left hidden.

Example Compiler Output

Consider the following C program:

#include <stdio.h>
int a;

int main () {
	int	i;

	a = 0;
	for (i=0; i<100; i++) {
		a = a + i;
	}
	printf ("%d\n", a);
	exit (0);
}

With GCC version 2.95.4 on my Pentium 4, it compiles into something like this:

.LC0:
	.string	"%d\n"
	.comm	a,4,4

main:
	movl $0,a		# store a 0 to memory location a
	xorl %edx,%edx		# edx := edx xor edx, i.e., edx := 0
.L21:
	movl %edx,%eax		# eax := edx
	addl a,%eax		# eax := eax + a
	movl %eax,a		# a := eax
	incl %edx		# edx := edx + 1
	cmpl $99,%edx		# compare edx to 99
	jle .L21		# if less than or equal then goto .L21
	pushl %eax		# push eax onto the stack
	pushl $.LC0		# push the address of "%d\n" onto the stack
	call printf		# call the printf function
	pushl $0		# push a 0 onto the stack
	call exit		# call the exit function

Let's go through this program and see what's going on.

What is each instruction doing?
What is the instruction count? That is, how many instructions are executed when this program runs?
Suppose branch instructions take 2 cycles, memory instructions take 3 cycles, and all other instructions take 1 cycle. How many cycles are consumed by this program?
What is the IPC (instructions per cycle) for this program?