CSCE 614 Fall 2019 Second Exam

Write your answers on this question paper. Write your name on this paper. Your answers must be clear and legible. Use the space provided to show how you arrived at your answer for partial credit. Only the answers you write in the spaces provided will be graded; you may use the back of the pages for your own private calculations. If a question seems ambiguous, state any assumptions you need to make to work the problem.





  1. True/False. For each of the following answer 'T' for True or 'F' for False.




    1. _____ GPU stands for "general purpose unit."
    2. _____ The primary cost for warehouse-scale computers is the cost of purchasing racks for compute nodes.
    3. _____ In a multi-processor system with cache coherence, when a cache block is read, it moves from "shared" state to "exclusive" state.
    4. _____ GPU stands for "general purpose unit."
    5. _____ Thread-level parallelism is exploited by Tomasulo's algorithm.
    6. _____ In a NUMA system, a miss in the last-level cache has the same penalty for any memory address.
    7. _____ If half of a program's execution time can be parallelized perfectly with 100 processors while the other half remains sequential, then the maximum speedup that can be achieved over a single processor is 50.
    8. _____ Once the valid bit is set to "true" in a cache block, it never becomes "false" again until the process using the block terminates.
    9. _____ The exclusiu simulation infrastructure is written in Java.
    10. _____ Message passing parallel programs can only be implemented on distributed-shared-memory multiprocessors.
    11. _____ Data-level parallelism is exploited by SIMD extensions to modern ISAs.
    12. _____ Compilers can directly generate code to use SIMD extensions from C++ code.
    13. _____ Test-and-set allows using the cache coherence protocol to implement locks.
    14. _____ MIMD is less efficient than SIMD when exploiting only data-level parallelism.
    15. _____ MIMD is the most general of the parallel computing paradigms.






















  2. Consider a direct-mapped cache with 16-bit addresses, 16-byte blocks, and 16 sets. The following table gives the contents of the cache valid bits and tags:
    Cache Contents
    Index
    Valid
    Tag
    0 1 0xa7
    1 1 0xf1
    2 1 0xd9
    3 1 0x2a
    4 0 0x82
    5 1 0xc8
    6 1 0xd8
    7 1 0xfe
    8 1 0x43
    9 0 0x4d
    10 1 0x98
    11 1 0x55
    12 0 0x8c
    13 1 0xe2
    14 1 0xb3
    15 1 0x47

    1. What is the capacity of the cache in bytes?


    2. What is the size of a tag in bits?


    3. For each of the following addresses, write hit in the blank space next to it if it would be a hit, or miss if it would be a miss. For each subsequent access, assume that the previous accesses have occurred and have updated the tags and valid bits:
      
      0x47f4 _____________
      
      0x4d9a _____________
      
      0x1a45 _____________
      
      0xa702 _____________
      
      0x1a42 _____________
      
      0x1b40 _____________
      
      0x1a45 _____________
      
      











  3. Consider the following speculative Tomasulo's algorithm table. It is the solution to the same kind of problem you had on your homework, but for a machine with a slightly different MIPS configuration:

    Answer the following questions about this machine based on information you derive from the table. Each answer should be short and fit in the space provided.

    1. How many MIPS instructions can be issued per clock cycle? ______

    2. What is the latency of a floating point addition? ______

    3. How many floating point multipliers are there? ______

    4. Are the floating point multipliers pipelined (yes or no)? ______

    5. How many instructions can be committed per cycle? ______

    6. How many reorder buffer entries are there? ______

    7. How many integers adders are there? ______

    8. Is there a separate adder for effective address calculation (yes or no)? ______

    9. Assuming all loads hit in the cache, what is the latency of the cache? ______

    10. There are several structural hazards in the schedule. Give one that, if removed, would speed up the code (i.e. allow it to finish before cycle #37).