Lecture 2: Instruction Set Architectures and Compilers
Instruction Set Architectures
An Instruction Set Architecture (ISA) is an agreement about how software
will communicate with the processor. A common scenario in an ISA has
the following features:
- A flat 32-bit address space
- A set of registers available to the programmer.
- A program counter register through which instructions are fetched,
initialized to some documented value.
- A set of external objects that can generate interrupts.
A description of the ISA for a processor answers the following questions:
- What instructions are available?
- What addressing modes are available?
- What is the format of data?
- How many and what kind of registers are available?
- What condition codes, if any, are defined?
- How are exceptions handled?
- How are interrupts handled?
What are some things not specified by the ISA?
- How fast will a particular instruction go?
- How is an instruction implemented?
- What are procedure calling conventions?
- What are cache replacement policies?
- What happens on a page fault?
Example
Let's look at an example of an ISA. We'll begin by looking at a program
written in assembly language, a language of mnemonic instructions.
There's usually a 1 to 1 correspondence between assembly language
instructions and machine instructions.
ori r1,r0,16 // OR immediate; r1 := r0 OR 16
ori r2,r0,1 // r2 := r0 OR 1
loop:
dadd r3,r2,r2 // r3 := r2 + r2
movz r2,r3,r0 // r2 := r3 if r0 == 0
dsubi r1,r1,1 // r1 := r1 - 1
bnez r1,loop // if r1 != 0 then goto loop
This program will be assembled into six 32-bit instructions in the MIPS ISA.
What does the program do?
Types of Instruction Sets
There are three main types of instruction sets:
- Stack
- Accumulator
- General-purpose register
These days, the first two are really only of historical interest. It turns
out that the Java virtual machine is like a stack-based ISA, only at
a higher level of representation. The x86 architecture is based on an
ancient instruction set that had some aspects of accumulator architecture;
it is classifed as a special-purpose register machine.
Of the general-purpose register (GPR) architectures, there are two
main types:
- Load/Store (or register-register). Only load and store instructions
access main memory. The rest of the instructions act only on registers.
- Register-memory. Any instruction may access main memory.
So-called Reduced Instruction Set Computing (RISC) architectures are
load/store architectures. Examples: SPARC, MIPS, Alpha AXP, PowerPC.
Complex Instruction Computing (CISC) architectures are usually
register-memory architectures. Examples: VAX, x86, MC68000.
Memory Addressing
Addressing memory is one of the most important functions of the ISA.
How memory is addressed is specified by the ISA.
Most computer systems divide memory into 8-bit bytes. The ISA decides
how to format these bytes into larger structures such as 32-bit
integers. Some of the important aspects of this organization are:
- Endianness. Memory can be accessed as either Big Endian
or Little Endian. That means a multi-byte structure such
as an integer or address may be stored with either it's most significant
byte first or last.
- Alignment. Some ISAs, for implementation reasons, require that memory
accesses be aligned. For instance, on the Alpha, an access to
a 8-byte quadword must be through an address divisible by 8; otherwise,
the offending load or store instruction will generate an exception.
- Addressing modes. Addressing mode refers to the way in which a
machine instruction accesses memory. The machine instruction contains
some data that is used to come up with the effective address that
will be used in some transaction with the memory system.
Some examples:
- PC-relative addressing. An immediate offset in the instruction
is added to the program counter register to yield the effective
address.
- Displacement addressing. An immediate offset in the instruction
is added to a specified register to yield the effective address.
- Immediate addressing. An immediate value is specified.
- Indirect addressing (in combination with other modes).
A first effective address is used to fetch a value from memory.
That value is used to form a second effective address.
There are many kinds of weird addressing modes (e.g. autoincrement),
but modern ISAs usually use just a few regular types of addressing modes.
Operand Types
An operand is a value that an instruction operates on. By
giving an instruction type and an addressing mode, we have somehow
specified some operands for the instruction. What kind of operands are there?
- Integers. Usually 8-bit (characters), 16-bit (words), 32-bit (doubleword),
64-bit (quadword). The terminology may differ from one ISA to another.
- Single and double precision floating point numbers, usually 32-bit and
64-bit respectively.
- Binary-coded decimal. A single decimal digit occupies one half of a
byte. Sometimes called packed decimal because decimal digits
are packed together into bytes.
- Strings. Some ISAs support variable-length strings of bytes as a
primitive data type in memory.
- Vectors of primitive types. Some (weird) ISAs support fixed or
variable length vectors of primitive types. Examples: CRAY vector
processors, MMX extensions to x86.
Types of Instructions
- Data transfer instructions. Load data from memory into registers,
or store data from registers into memory. Transfer data between different
kinds of special-purpose registers.
- Arithmetic and logical instructions. Perform arithmetic (e.g. add,
subtract, multiply) and logic (e.g. AND, OR, XOR) as well as a combination
of both (less than, greater than, compare).
- Control transfer instructions. Instructions that affect the value of
the program counter register. Unconditional jump, procedure call, return,
conditional branch, indirect jump, software interrupt (e.g. trap). Also,
instructions that do something based on some condition, e.g. predicated
instructions.
- Floating point instructions. Traditionally, instructions that deal
with floating point values are given separate treatment. Add, multiply,
scientific calculator-type functions (e.g. tangent, square root), convert
between integer, single, and double precision.
Instruction Encoding
How are instruction types, operands, addressing modes, etc. communicated
to the hardware? The ISA specifies a binary encoding of instructions.
The assembler encodes programs using this encoding, and the microarchitecture
reads and executes the encoded program. The MIPS instruction set is a
good example.
Example: The MIPS instruction set
Every instruction in the MIPS instruction set is 32-bit long. Let's number
the bits from 0 to 31, with 31 being the most significant bit. There are 32
integer registers. Most of them are general purpose. Register R0 is always
set to 0. The first six bits, bits 31-26, specify an opcode giving
information about what that instruction is supposed to do. MIPS registers
and addresses are 64-bit. MIPS is byte-addressable, requires aligned
accesses, and can be switched to either Big Endian or Little Endian.
There are three general types of instructions:
- I-type instructions. Instructions with immediate operands. rt := rs
op immediate.
___________________________________________________________________________
|_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|________16-bit_immediate______________|
- R-type instructions. Register-register arithmetic and logic instructions.
___________________________________________________________________________
|_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|_5-bit_rd_|_5-bit_shamt_|_6-bit_funct_|
- J-type instructions. Jump to PC-relative address. Conditional jumps,
jump-and-link, trap.
___________________________________________________________________________
|_6-bit opcode_|_____________26-bit offset added to PC______________________|
Compilers
A compiler is a program that translates programs from a high-level
language into machine instructions that can be directly executed by
the CPU. The compiler has a large and, these days, increasing impact on
the performance of computer systems.
Structure of an Optimizing Compiler
An optimizing compiler puts a great deal of effort into improving
the quality of the generated machine code. The compiler has the following
stages:
- Front end. The front end reads in the source code written by the
programmer. It performs lexical analysis and parsing
to get the code into an intermediate form that can be easily worked with in
the rest of the compiler. The result of the front end is an intermediate
representation of the program code, e.g. an abstract syntax tree
or three-address code.
- High-level optimizations. This stage performs high-level
code-improving transformations on the intermediate representation.
The transformations at this stage are mostly machine-independent, i.e.,
they require little or no knowledge about the ISA. Some examples:
- Constant propagation, constant folding
- Redundancy elimination
- Loop transformations
- Procedure integration (automatic inlining)
- Dead and unreachable code elimination
- Low-level optimizations. At this stage, the code is transformed into
a lower-level intermediate representation that has more information about
the ISA. Examples of possible optimizations:
- Strength reduction
- Machine idioms
- Register allocation
- Cache-concious loop transformations
- Code generation. At this stage, assembly-language code is generated
from the intermediate representation. Some optimizations may be performed
at this stage:
- Code placement
- Low-level feedback-directed optimization, e.g. branch hints
- Instruction selection (e.g. mov 0 vs. xor)
- Peephole optimizations
- Assembly. The assembler transforms the assembly language program into
machine instructions ready to be loaded. Certain alignment optimizations
may be performed at this time.
Note that in some compilers, some phases may be iterated over several times,
and the same types of optimizations may be done at different levels of
intermediate representations.
Effect of Compiler on the Architecture
The computer architect must be aware of compiler technology. Compilers
decide what the mix of instructions executed will be.
These days, architecture people and compiler
people must cooperate to achieve improved performance.
Example Problems
Here are some examples of problems that might occur when there isn't enough
communication between the hardware and software people.
- Useless instructions. If an architect designs a very cool instruction
that is hardly ever used by the compiler, something has gone wrong. The
VAX polynomial-evaluate and CALL instructions are examples. An instruction
set must be, in a sense, easy to compile for.
- Overly-ambitious or useless optimizations. The compiler writer
might think that reducing the instruction count is the most important
way to improve performance. However, the compiler writer must be aware
that on modern microarchitectures, sometimes programs that execute more
instructions can be faster. For example, multiplying by a constant can
be done with a single multiply instruction or several shifts and adds.
The latter may be faster, depending on microarchitectural details.
On the Alpha, for example, inserting NOP (no operation) instructions
in just the right places will cause a more even usage of functional
units, speeding things up.
- Not exposing enough of the microarchtecture. Modern microarchitectures
with caches, speculation, out-of-order execution, etc. are increasingly
complex. It is very difficult for a compiler writer to come up with a
reasonable performance model without microarchitectural details. If more
of these details are exposed, either through documentation or explicit
changes in the ISA (e.g. performance counters, hint instructions), the
compiler writer will be able to better improve the generated code.
- Exposing too much of the microarchitecture. Encoding microarchitectural
details into the ISA means that every subsequent implementation of the
ISA must support these details or risk not being backward-compatible.
A good example is branch delay slots.
Note that those last two items might look inconsistent with one another.
Compiler writers and architects should get together and decide what parts
of the implementation should be exposed and what parts are best left hidden.
Example Compiler Output
Consider the following C program:
#include <stdio.h>
int a;
int main () {
int i;
a = 0;
for (i=0; i<100; i++) {
a = a + i;
}
printf ("%d\n", a);
exit (0);
}
With GCC version 2.95.4 on my Pentium 4, it compiles into something like this:
.LC0:
.string "%d\n"
.comm a,4,4
main:
movl $0,a # store a 0 to memory location a
xorl %edx,%edx # edx := edx xor edx, i.e., edx := 0
.L21:
movl %edx,%eax # eax := edx
addl a,%eax # eax := eax + a
movl %eax,a # a := eax
incl %edx # edx := edx + 1
cmpl $99,%edx # compare edx to 99
jle .L21 # if less than or equal then goto .L21
pushl %eax # push eax onto the stack
pushl $.LC0 # push the address of "%d\n" onto the stack
call printf # call the printf function
pushl $0 # push a 0 onto the stack
call exit # call the exit function
Let's go through this program and see what's going on.
- What is each instruction doing?
- What is the instruction count? That is, how many instructions are
executed when this program runs?
- Suppose branch instructions take 2 cycles, memory instructions
take 3 cycles, and all other instructions take 1 cycle. How many cycles
are consumed by this program?
- What is the IPC (instructions per cycle) for this program?