CS 5513 Fall 2008, Homework 4

Part I of this homework is due at the beginning of class, October 15, 2008. Part II of this homework is due at the beginning of class, October 29, 2008.

Cache Simulators

This assignment consists of two programming problems. The first problem is to modify the xocolatl emulator to include a cache simulator. The second is to write a C program that can determine the configuration of the L1 and L2 caches for some x86 computers to which you will be given access.
  1. For this problem, modify the xocolatl emulator to include a cache simulator:
    1. Download the xocolatl-cache emulator for the MC6809 processor. Unpack it into your home directory on one of the UTCS machine with tar -xvzf xocolatl.tar.gz . Note that this is a new version of the emulator that has been augmented with hooks for a cache simulator.
    2. Modify the file xocolatl/src/projects/cache.java to simulate a cache as described below.
    3. Compile xocolatl with your modifications and run it on the benchmarks in the main xocolatl directory. Use the cache configurations specified below.
    4. Turn in your code as well as a writeup of observations and experiences including two bar graphs:
      1. One that shows the miss rate for each benchmark for each cache configuration, as well as the arithmetic mean over all benchmarks for each configuration.
      2. One that shows the CPI for each benchmark for each cache configuration, as well as the geometric mean over all benchmarks for each configuration.
      In your writeup, write about your observations of the effect of different cache parameters on miss rate and performance.
    Your code should produce no output; it should only simulate the cache. around in memory and do computation. If you like to have debug output, please disable it in the code you use to turn in your assignment.

    Programming Interface and What to Simulate

    You should modify xocolatl/projects/cache.java to simulate a cache and accurately report the number of misses.

    The file xocolatl/projects/cache.java has the following methods defined:

    The command-line arguments of the xoco program have been augmented to accept a -cache option followed by three integers and one string specifying the simulated cache's block size, associativity (i.e. number blocks per set), number of sets, and replacement policy (one of "lru", "fifo", "random", or "none" for direct-mapped). The emulator automatically calls the constructor with these parameters and translates the name of a replacement policy to a number from 0 to 3 so all you have to do is write the simulator code in cache.java.

    Hints for writing a precise simulator

    Pay attention to these hints so your simulator will compute the exact number of misses for each benchmark and configuration:

    Compiling and running xocolatl

    As in Homework 2, to compile xocolatl together with the code you have written in xocolatl/src/projects/cache.java, change to the directory xocolatl/src and type make.

    There are 5 benchmarks:

    You can run the benchmarks with the simulator using the run-benchmark script located in the xocolatl directory. The first argument to run-benchmark should be the name of the benchmark, and the second argument should be, in quotes, a command-line argument to the simulator to pass the cache configuration. For example, to run the sort benchmark with a 2-way set associative cache that has 8-byte blocks, 16 sets, and an LRU replacement policy, you would type the following from the xocolatl directory:
    ./run-benchmark sort "-cache 8 2 16 lru"
    
    (The quotes are necessary so that command-line option to the emulator won't be parsed by the run-benchmark shell script. Note also that the replacement policy e.g. "lru" should be in lowercase.)

    Each benchmark will run for a while, periodically printing output related to the running simulated program, and then the simulator will finish with statistic related to the cache simulation. In the case of running sort with the above configuration, the last few lines of output look like this:

    executed 27569591 instructions
    9915309 misses
    116242901 accesses
    8.530% miss rate
    1019100491 estimated cycles
    36.965 estimated CPI
    
    As you can see, the emulator reports the number of instructions executed, the number of simulated cache misses, the number of memory accesses, the simulated cache miss rate, an estimated number of cycles, and an estimated CPI. The estimated number of cycles and CPI are based on a machine model where every instruction takes one cycle and every cache miss has a penalty of 100 cycles (this is computed automatically by the emulator so your code doesn't need to do anything special about cycles or CPI).

    Configurations

    In your writeup, report the results of running all 5 benchmarks with the following configurations:
    1. 1KB cache, 8 byte blocks, 8-way set associativity, LRU replacement.
    2. 768 byte cache, 8 byte blocks, 6-way set associativity, LRU replacement.
    3. 1KB cache, 8 byte blocks, 4-way set associativity, LRU replacement.
    4. 1KB cache, 8 byte blocks, 2-way set associativity, LRU replacement.
    5. 1KB cache, 8 byte blocks, 4-way set associativity, random replacement.
    6. 1KB cache, 8 byte blocks, 2-way set associativity, random replacement.
    7. 1KB cache, 8 byte blocks, 4-way set associativity, fifo replacement.
    8. 1KB cache, 8 byte blocks, 2-way set associativity, fifo replacement.
    9. 1KB cache, 4 byte blocks, 4-way set associativity, LRU replacement.
    10. 1KB cache, 1 byte blocks, 4-way set associativity, LRU replacement
    11. 1KB cache, 8 byte blocks, direct-mapped
    12. 2KB cache, 8 byte blocks, direct-mapped
    13. 1KB cache, 4 byte blocks, direct-mapped
    14. 1KB cache, 8 byte blocks, fully associative, random replacement
    Clearly, you will have to do some arithmetic to translate the configurations as specified above into the parameters expected by the simulation.

    Examples

    These examples will help you develop your simulator. These are only examples; you should turn in results for all of the benchmarks for all of the configurations above. Note that, for these examples with lru, direct-mapped, and fifo replacement, your simulator should match exactly the number of misses. With random replacement, your result should be close, but depending on your implementation of randomness might be slightly different.

    Note: If you are running the emulator after having logged in remotely, you might need to set your DISPLAY environment variable to ":0.0" to fool the JVM into believing it has a screen before it can run. See the README file in xocolatl/ for information about using the video display option.

  2. Write a C program to determine the block size, associativity, and number of sets for the L1 and L2 caches of x86 machines to which you will be given access:

    Hints

    Constraints

    The output of your program should be brief and make it clear what the capacity, blocks per set, block size, and number of sets is for both L1 and L2. Your program must run in at most 20 minutes on even the slowest of the machines. Your program may not probe the computer hardware to determine the cache sizes; it must use the behavior of the unmodified sieve function as its only way of probing the machine. All of your code must be contained in a single source file named testcache.c. Your program must be in C and must compile and run on the computers mentioned above.

    Coordinating Work

    We must share all the machines among the 32 people in the class. Do your development on a different machine, e.g. the computers in the Main Lab. Use the class newsgroup, utsa.cs.5513.d, to coordinate usage of the machine. Use the program 'tin -r' on the machines in the Main Lab to access the newsgroup. Read messages others have posted, then post a message saying e.g., "I plan to use valdepenas from 4:00pm to 5:00pm on October 4." You can see if someone else is using the same machine by typing w. If you see someone else is already working, log off. Multiple users might perturb each others' results. If you run out of time on the machines then you can't test your program; this is not an excuse for a poor program as you have had three weeks to work on this assignment.

You may not work together on this assignment with other classmates or receive assistance from any person other than your professor or teaching assistant. Do not use code that isn't your own work for this assignment. Do not allow other classmates access to your code. Do not allow anyone access to a computer where your assignment is stored.

Your assignment should be nicely typed or word-processed. You must turn in your writeup on paper in class. You must turn in your Java and C code electronically by emailing your cache.java file and testcache.c file to the teaching assistant before the beginning of class. The files you email should be sufficient to run your programs, so do not email any other files. Your Java and C code will be graded not only for whether or not it works, but also for clarity and documentation.

Part I of this homework, including the Java code and writeup, is due at the beginning of class, October 15, 2008. Part II of this homework, including the C code and writeup, is due at the beginning of class, October 29, 2008.

Late assignments will not be accepted.