CS/CoE 1541 - Exam 2 (Spring 2019). Name: $\qquad$
Question 1 (15 points): Show the content of each of the caches shown below after the two memory references

$$
35,44
$$

Use the notation [tag, M (address),...] to describe the content of each entry. For example [4,M(46)] indicates that the entry contains tag=4 and the data from memory location 46. Similarly, [4, $\mathrm{M}(46), \mathrm{M}(47)$ ] indicates that the entry contains a block of two words from locations 46 and 47.

| (a) An 8-words, direct mapped cache <br> with block size = 1 word |  |
| :---: | :---: |
| Index | Content of cache |
| 0 |  |
| 1 |  |
| 2 |  |
| 3 |  |
| 4 |  |
| 5 |  |
| 6 |  |
| 7 |  |


| (c) A 64-words, 2-ways set associative cache with Block size = 4 words |  |  |
| :---: | :---: | :---: |
| Block <br> Index | Way 1 | Way 2 |
| 0 |  |  |
| 1 |  |  |
| 2 |  |  |
| 3 |  |  |
| 4 |  |  |
| 5 |  |  |
| 6 |  |  |
| 7 |  |  |

## Question 2 (15 points):

(a) Consider a 256 KB , 4-way set-associative L1 cache with 16-Bytes blocks in a 32-bit byte addressable system. Complete the following sentences assuming that a byte address is of the form $b_{31}, b_{30}, \ldots, b_{1}, b_{0}$ :
(i) The total number of blocks in the cache is blocks
(ii) Of the 32 bits of the address, The number of bits used for indexing the cache is $\qquad$ and the number of bits used for tagging is $\qquad$
(iii)The address bits used for indexing a block in the cache are b $\qquad$ , b
$\qquad$ , ... , b $\qquad$ , b $\qquad$ (Identify the subscripts)
(iv)The address bits that are used as $a \operatorname{tag}$ for a block are b $\qquad$ , b $\qquad$ , ..., b $\qquad$ , b (Identify the subscripts)
(v) The total number of bits used as tags for all the blocks in the cache is
(b) Assume that the miss rate for this cache is $5 \%$ and that the miss penalty (number of cycles needed to access a block in memory) is 90 cycles. What is the average memory access time (in cycles) for the memory hierarchy (composed of the cache + the memory)?
(c) Assume that an L2 is added to the memory hierarchy and that the access time for the L2 is 6 cycles. What is the he average memory access time (in cycles) for the memory hierarchy if $80 \%$ of the accesses to the L2 (the misses from the L1) are hits.

## Question 3 (15 points):

a) Assume that the time to open a row in a DRAM bank is 100 ns , the time to close the row is 80 ns and the time to access a column in the row buffer is 20 ns . Assume also that the memory controller accesses two bytes X and Y in succession (issues a request for Y as soon as it receives X ) and neither of them is in the row buffer. Compute the delay from the time the request to X is issued to the time Y is returned to the memory controller in the following cases:
(i) X and Y are in the same DRAM row and the open row policy is applied
(ii) X and Y are in the same DRAM row and the closed row policy is applied
(iii) X and Y are not in the same DRAM row and the open row policy is applied
b) Assuming that each of the two nxn arrays A and B is stored row-wise in memory ( $\mathrm{A}[0][0]$, $\mathrm{A}[0][1], \mathrm{A}[0][2], \ldots$ and $\mathrm{B}[0][0], \mathrm{B}[0][1], \mathrm{B}[0][2], \ldots$. ), complete the following sentences pertaining to the cache miss rate during the execution of the following loop which stores the transpose of array A into array B

$$
\begin{aligned}
& \text { for }(\mathrm{i}=0 ; \mathrm{i}<\mathrm{n} ; \mathrm{i}++) \\
& \quad \operatorname{for}(\mathrm{j}=0 ; \mathrm{j}<\mathrm{n} ; \mathrm{j}++) \\
& \quad B[\mathrm{j}][\mathrm{i}]=A[\mathrm{i}][\mathrm{j}] ;
\end{aligned}
$$

(i) When all the elements of A and B can fit into the cache and the cache block size $=4$ words, the cache miss rate during the execution of the above loop is $\qquad$ for array A and $\qquad$ for array B.
(ii) When the cache can only fit 4 n elements (of either A or B) while the cache block size is 4 elements, the cache miss rate during the execution of the above loop is $\qquad$ for array A and
$\qquad$ for array B.
(iii)When the cache can only fit 4 n elements of A and 4 n elements of B while the cache block size is 4 elements, the cache miss rate during the execution of the above loop is $\qquad$ for array A and $\qquad$ for array B.
c) What is the least recently used reference in the sequence of references $3,4,2,1,1,4,3,4,2,4,3$ ?

## Question 4: (15 points):

(a) For each entry in the following table, indicate if the statement is true or false by marking the appropriate column.

|  |  | true | false |
| :--- | :--- | :--- | :--- |
| 1 | In a write through cache, every store operation to a location causes a write to <br> that location both in the cache and in memory |  |  |
| 2 | Having larger cache block sizes explores the property of temporal locality |  |  |
| 3 | The CPU pipeline stalls if there is a miss in the instruction cache but not if <br> there is a miss in the data cache. |  |  |
| 4 | Larger cache block sizes always result in a lower cache miss rate |  |  |
| 5 | Each process in the system has its own page table |  |  |
| 6 | For the same virtual and physical memory sizes, the size of the page table <br> increases when the page size is smaller |  |  |
| 7 | The Page Table walker is invoked when a page fault is detected |  |  |
| 8 | With interleaved memory, the delay of fetching a block of K words is smaller <br> than K times the delay of fetching one word. |  |  |

(b) The $(8,12)$ Hamming code encodes 8 -bits data words into 12 -bits code words using 4 parity bits ( $\mathrm{p} 1, \mathrm{p} 2$, p3 and p4) using the following template.

| Encoded data bits |  | p1 | p2 | d1 | p4 | d2 | d3 | d4 | p8 | d5 | d6 | d7 | d8 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Parity bit coverage | p1 | x |  | x |  | x |  | x |  | x |  | x |  |
|  | p2 |  | X | X |  |  | X | X |  |  | x | X |  |
|  | p4 |  |  |  | X | X | X | X |  |  |  |  | x |
|  | p8 |  |  |  |  |  |  |  | X | x | x | X | X |

Complete the following sentences:
(i) When encoding 00100100 the parity bits are:

$$
\mathrm{p} 1=\quad \mathrm{p} 2=\quad \mathrm{p} 4=\quad \mathrm{p} 8=
$$

and the corresponding code word is:
(ii) Assuming that there are no errors in the code word 011001000011 , the corresponding data word is

## Question 5 (15 points):

Consider a computer system that supports a 1 MB virtual address space (byte addressable) with 4 KB pages, 512 KB physical memory and a 4-entries, 2-way associative TLB. Also, assume that the TLB configuration and the (partial) content of the page table are as shown:

| TLB |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathbf{v}$ | Tag | Physical page \# | $\mathbf{v}$ | Tag | Physical page \# |
| 1 | 0000011 | 0100100 | 0 |  |  |
| 1 | 0001111 | 0100101 | 1 | 0000111 | 0000101 |

a) Given the following sequence of references to virtual byte addresses, specify for each reference the virtual page number and indicate if the reference results in a TLB miss or hit. Also, for each reference, indicate the physical page number that results after the virtual-to-physical address translation process. If any reference results in a page fault, do not provide the physical page number.


| Virtual byte address | $\rightarrow$ Virtual page \# | $\rightarrow$ TLB hit/mis | $\rightarrow$ Physical page \# |
| :--- | :--- | :--- | :--- |
| 00001111000001000001 | $\rightarrow$ | $\rightarrow$ | $\rightarrow$ |
| 00000011101111101001 | $\rightarrow$ | $\rightarrow$ | $\rightarrow$ |
| 00000110000111111100 | $\rightarrow$ | $\rightarrow$ | $\rightarrow$ |
| 00100000000100001100 | $\rightarrow$ | $\rightarrow$ | $\rightarrow$ |

b) Assuming that physical page 0001000 is the page that is replaced in the physical memory in case of a page fault, show the content of the TLB after the addresses of part (a) are referenced in the give order and all the TLB misses and page faults are serviced.

| TLB |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :---: |
| $\mathbf{v}$ | Tag | Physical page \# | $\mathbf{v}$ | Tag | Physical page \# |  |
|  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |

Question 6 ( 15 points): Consider 4 processors, P0, P1, P2 and P3, each with its private cache, with a snoopy bus connecting the caches to memory. Assume a cache block size of two words and let ( $\mathrm{x}, \mathrm{y}$ ) and ( $\mathrm{z}, \mathrm{w}$ ) be two cache blocks that map to the same cache location, L. As shown in the tables below, assume that initially, the caches of P0 and P1 contain ( $\mathrm{x}, \mathrm{y}$ ) in the shared state ( S ), the cache of P2 contains ( $\mathrm{z}, \mathrm{w}$ ) in the exclusive state (E) while location L in the cache of P3 is invalid (I). For each of the actions indicated in (a), (b) and (c), start from the initial state and list the bus activities and the final state of location L in all the caches. Use one or more of the following to specify the activities on the bus:

- Pi requests (, ) to read
- Pi requests (, ) to write
- Pi posts "invalidate" (, )
- Pi writes back (, )
- Memory returns the value of (, )
(a) P1 stores (writes) 50 into y

|  | State of L in P0 | State of L in P1 | State of L in P2 | State of L in P3 |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Initial state | $\mathrm{x}=10, \mathrm{y}=20(\mathrm{~S})$ | $\mathrm{x}=10, \mathrm{y}=20(\mathrm{~S})$ | $\mathrm{z}=30, \mathrm{w}=40(\mathrm{E})$ | I |  |
| Bus activity(ies) |  |  |  |  |  |
| Final state |  |  |  |  |  |

(b) P2 stores (writes) 30 into x

|  | State of L in P0 | State of L in P1 | State of L in P2 | State of L in P3 |
| :--- | :---: | :---: | :---: | :---: |
| Initial state | $\mathrm{x}=10, \mathrm{y}=20(\mathrm{~S})$ | $\mathrm{x}=10, \mathrm{y}=20(\mathrm{~S})$ | $\mathrm{z}=30, \mathrm{w}=40(\mathrm{E})$ | I |
| Bus activity(ies) |  |  |  |  |
| Final state |  |  |  |  |

(c) P3 loads (reads) w

|  | State of L in P0 | State of L in P1 | State of L in P2 | State of L in P3 |
| :--- | :--- | :--- | :--- | :---: |
| Initial state | $\mathrm{x}=10, \mathrm{y}=20(\mathrm{~S})$ | $\mathrm{x}=10, \mathrm{y}=20(\mathrm{~S})$ | $\mathrm{z}=30, \mathrm{w}=40(\mathrm{E})$ | I |
| Bus activity(ies) |  |  |  |  |
| Final state |  |  |  |  |

(d) Complete the following sentences:

- The state of a block is "shared" when the valid bit = and the dirty bit =
- The state of a block is "exclusive" when the valid bit = and the dirty bit =
- The state of a block is "Invalid" when the valid bit = and the dirty bit =

