CSCE 614 Fall 2021 Project

Branch Misprediction Analysis

For this project, you will analyze branches in benchmarks that are frequently mispredicted to try to determine where they come from and why they are mispredicted.

Infrastructure

Get this big tar.gz file: 614project.tar.gz. You can only download it from within TAMU or the TAMU VPN. Unpack this into directory that has at least ~3GB free, since there are a lot of big files.

Here you will find a branch prediction simulator in 'src', branch traces in 'trace', and the executable, source, and object files for the benchmarks that generated the traces in 'bench'.

There are three branch predictors provided: TAGE-SC-L, Multiperspective Perceptron, and gshare. To compile them, go into src and type one of these:

make config-tagescl
make config-multiperspective
make gshare

and then type "make" again. Run the simulator using the 'predict' binary on one of the traces in 'trace'. There are many traces for each benchmark, each one representing a different region of interest of 100 million instructions.

The Project

Each student will find a distinct collection of 10 branches from the different benchmarks that are "hard to predict." Each such branch should be executed at least 10,000 in one of the traces, and should be mispredicted at least 10% of the time by TAGE-SC-L. You will need to modify the simulator to find hard to predict branches. The obvious approach would be to keep a map from branch addresses to pairs of branch counts and misprediction counts, then print out the addresses of branches matching the "hard to predict" criteria. You should claim your 10 branchs by posting the name of the benchmark and the hexedecimal address of the branch to a common bulletin board that Pritam will set up. That way, everyone is working on different branches. You will then determine, for each of your branches, where in the source code those branches appear. You will do this by disassembling the executable for the benchmark and using it to map the branch back to the source code for the benchmark. I recommend you use gdb for the disassembly, as well as nm to inspect the symbol tables for the executables so you can find the areas in the source code corresponding to the disassembly. Once you have found the source of the hard to predict branch, try to determine why the branch is hard to predict. Is it only hard to predict for TAGE-SC-L, or for the other two predictors as well? What is the misprediction rate for that branch on the other predictors?

What to Turn In

Your writeup should consist of 10 sections, one for each branch. Each section should include the following:

The name of the benchmark.
The name of the trace file.
The hexadecimal address of the hard-to-predict branch.
The number of times the branch is executed in that trace file.
The percentage of times the branch is mispredicted in that trace file, for each of the three predictors.
An assembly listing including that branch, with the five instructions before and the five instructions after that branch.
The high-level source listing of that branch, with sufficiently many lines before and after to provide context for what the purpose of the branch is in that benchmark.
In your own words, an explanation of what the purpose of this branch is in the context of the source code.
Your well-reasoned thoughts on why this branch is hard to predict for the various predictors. Provide at least a good answer for gshare, and if possible for the other two predictors. If you can't tell why this branch is hard for at least gshare, try a different branch.

This project will be due sometime toward the end of the semester; the exact timing will be announced in a later class. The contents of this page may change as we make (or fail to make) progress on the project.