











## Communication with Distributed Memory Architectures

### · Distributed shared-memory

- One logical memory distributed among physical memories
- I.e., address space is shared (same shared address on two processors refers to the same location)
- Implicit shared communication (via shared address space)
- NUMA: Non-uniform memory access (why?)

### • Multicomputers

- Separate private address spaces for each PE
- Same address on two processors: two different locations
- Explicit communication (message passing)
- Libraries for standard communication primitives (e.g., MPI)

10





### • Latency Hiding

- Overlap communication with other communications or computations
- Can be difficult to exploit and application dependent
- · Flexible communication mechanisms
  - Perform well with
    - » Smaller and larger transmissions
    - » Irregular and regular communication patterns
  - I.e., not overly optimized
  - But.... May be able to improve communication performance if optimized for specific patterns (e.g., interconnection topology)



# Shared Memory Message Passing - Compatibility, well understood - Simpler hardware (coherence) - Ease of programming for complex communication (just do it!) - Simpler hardware (coherence) - Better for smaller communications Protection implemented in the HW rather than in the OS - Simpler hardware (coherence)

- Hardware-controlled caching Automatic caching of shared and private data
- Easy to implement message passing on top of shared memory since it's just a memory copy
- Shared memory can be built on top of message passing but the cost is very high (every access becomes a message!)

13





# Coherence Mechanisms

- Migration
  - Data moved to a local cache where it can be accessed locally
     Reduces latency to shared data that is allocated remotely
- Replication
  - Copies of shared data that can be read by multiple processors
  - Reduces latency and contention for shared item
- Directory-based Centralized directory tracks current location of data
- Snooping State of blocks kept at local caches by watching interconnect (bus) transactions





|                       |              |                       | Cache                 | Marra                               |
|-----------------------|--------------|-----------------------|-----------------------|-------------------------------------|
| Processor<br>Activity | Bus Activity | Contents for<br>CPU A | Contents for<br>CPU B | Memory<br>Contents fo<br>location X |
|                       |              |                       |                       | 0                                   |
|                       |              |                       |                       |                                     |
|                       |              |                       |                       |                                     |
|                       |              |                       |                       |                                     |
|                       |              |                       |                       |                                     |

| Activity     Bus Activity     CPU A     CPU B     location       CPU A Reads     Cache Miss for X     0     0 | Activity     Bus Activity     CPU A     CPU B     location       CPU A Reads     Cache Miss for X     0     0 | Activity Bus Activity C     |      |       | Contents fo |
|---------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|-----------------------------|------|-------|-------------|
| CPU A Reads Cache Miss for X 0 0                                                                              | CPU A Reads Cache Miss 0 0                                                                                    |                             | CPUA | CPU B | location X  |
| CPU A Reads     Cache Miss     0     0       X     for X     0     0                                          | CPU A Reads Cache Miss 0 0                                                                                    |                             |      |       | 0           |
|                                                                                                               |                                                                                                               | PU A Reads Cache Miss for X | 0    |       | 0           |
|                                                                                                               |                                                                                                               |                             |      |       |             |

|                       | Exam                | ple: Inva                      | alidate                        |                                      |
|-----------------------|---------------------|--------------------------------|--------------------------------|--------------------------------------|
| Processor<br>Activity | Bus Activity        | Cache<br>Contents for<br>CPU A | Cache<br>Contents for<br>CPU B | Memory<br>Contents for<br>location X |
|                       |                     |                                |                                | 0                                    |
| CPU A Reads<br>X      | Cache Miss<br>for X | 0                              |                                | 0                                    |
| CPU B Reads<br>X      | Cache Miss<br>for X | 0                              | 0                              | 0                                    |
|                       |                     |                                |                                |                                      |
|                       |                     |                                |                                |                                      |

|                        | Exam                | ole: Inva                      | alidate                        |                                      |
|------------------------|---------------------|--------------------------------|--------------------------------|--------------------------------------|
| Processor<br>Activity  | Bus Activity        | Cache<br>Contents for<br>CPU A | Cache<br>Contents for<br>CPU B | Memory<br>Contents for<br>location X |
|                        |                     |                                |                                | 0                                    |
| CPU A Reads<br>X       | Cache Miss<br>for X | 0                              |                                | 0                                    |
| CPU B Reads<br>X       | Cache Miss<br>for X | 0                              | 0                              | 0                                    |
| CPU A writes 1<br>to X | Invalidation for X  | 1                              |                                | 0                                    |

|                        | Exam                | ple: Inva                      | alidate                        |                                      |
|------------------------|---------------------|--------------------------------|--------------------------------|--------------------------------------|
| Processor<br>Activity  | Bus Activity        | Cache<br>Contents for<br>CPU A | Cache<br>Contents for<br>CPU B | Memory<br>Contents for<br>location X |
|                        |                     |                                |                                | 0                                    |
| CPU A Reads<br>X       | Cache Miss<br>for X | 0                              |                                | 0                                    |
| CPU B Reads<br>X       | Cache Miss<br>for X | 0                              | 0                              | 0                                    |
| CPU A writes 1<br>to X | Invalidation for X  | 1                              |                                | 0                                    |
| CPU B Reads<br>X       | Cache Miss<br>for X | 1                              | 1                              | 1                                    |

| Processor        | Pue Activity        | Cache<br>Contents for | Cache<br>Contents for | Memory<br>Contents for |
|------------------|---------------------|-----------------------|-----------------------|------------------------|
| Activity         | Bus Activity        | CFUA                  | CFUB                  | 0                      |
| CPU A Reads<br>X | Cache Miss<br>for X | 0                     |                       | 0                      |
| CPU B Reads<br>X | Cache Miss<br>for X | 0                     | 0                     | 0                      |

| norv              |
|-------------------|
| nts for<br>tion X |
| 0                 |
| 0                 |
| 0                 |
| 1                 |
| (                 |

|                        |                       | pie: vvri                      | te Upda                        | ate                                  |
|------------------------|-----------------------|--------------------------------|--------------------------------|--------------------------------------|
|                        |                       | Casha                          | Casha                          |                                      |
| Processor<br>Activity  | Bus Activity          | Cache<br>Contents for<br>CPU A | Cache<br>Contents for<br>CPU B | Memory<br>Contents for<br>location X |
|                        |                       |                                |                                | 0                                    |
| CPU A Reads<br>X       | Cache Miss<br>for X   | 0                              |                                | 0                                    |
| CPU B Reads<br>X       | Cache Miss<br>for X   | 0                              | 0                              | 0                                    |
| CPU A writes 1<br>to X | Write update<br>for X | 1                              | 1                              | 1                                    |
| CPU B Reads<br>X       |                       | 1                              | 1                              | 1                                    |



















| Exam                                     | ple                                 |
|------------------------------------------|-------------------------------------|
| Assume: A1 and A2 map to same cache bloc | к В, initial cache state is invalid |
| Event In P1's cach                       | ne In P2's cache                    |
| B = invalid                              | B = invalid                         |
|                                          |                                     |
|                                          |                                     |
|                                          |                                     |
|                                          |                                     |
|                                          |                                     |

| Exan                                    | nple                                  |     |
|-----------------------------------------|---------------------------------------|-----|
| Assume: A1 and A2 map to same cache blo | ock B, initial cache state is invalid |     |
| Event In P1's cac                       | he In P2's cache                      |     |
| B = invalid                             | B = invalid                           |     |
| P1 writes 10 to A1                      |                                       |     |
|                                         |                                       | -   |
|                                         |                                       | _   |
|                                         |                                       |     |
|                                         |                                       | _   |
|                                         |                                       | _44 |

|                       | Example                      |                            |
|-----------------------|------------------------------|----------------------------|
| Assume: A1 and A2 map | to same cache block B, initi | ial cache state is invalid |
| Event                 | In P1's cache                | In P2's cache              |
|                       | B = invalid                  | B = invalid                |
| P1 writes 10 to A1    | A1 = 10 (modified)           | B = invalid                |
|                       |                              |                            |
|                       |                              |                            |
|                       |                              |                            |
|                       |                              |                            |

| <b>Example</b><br>Assume: A1 and A2 map to same cache block B, initial cache state is invalid |                    |               |   |  |  |
|-----------------------------------------------------------------------------------------------|--------------------|---------------|---|--|--|
| Event                                                                                         | In P1's cache      | In P2's cache |   |  |  |
|                                                                                               | B = invalid        | B = invalid   |   |  |  |
| P1 writes 10 to A1                                                                            | A1 = 10 (modified) | B = invalid   |   |  |  |
| P1 reads A1                                                                                   |                    |               |   |  |  |
|                                                                                               |                    |               |   |  |  |
|                                                                                               |                    |               |   |  |  |
|                                                                                               |                    |               |   |  |  |
|                                                                                               |                    | 4             | 6 |  |  |

| Assume: A1 and A2 map to same cache block B, initial cache state is invalid |                                                                                                          |  |  |  |
|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|--|--|--|
|                                                                             |                                                                                                          |  |  |  |
| B = invalid                                                                 | B = invalid                                                                                              |  |  |  |
| A1 = 10 (modified)                                                          | B = invalid                                                                                              |  |  |  |
| A1 = 10 (modified)                                                          | B = invalid                                                                                              |  |  |  |
| × /                                                                         |                                                                                                          |  |  |  |
|                                                                             | to same cache block B, initi<br>In P1's cache<br>B = invalid<br>A1 = 10 (modified)<br>A1 = 10 (modified) |  |  |  |

|                                                                             | Fxample            |               |
|-----------------------------------------------------------------------------|--------------------|---------------|
| Assume: A1 and A2 map to same cache block B, initial cache state is invalid |                    |               |
| Fvent                                                                       | In P1's cache      | In P2's cache |
|                                                                             | B = invalid        | B = invalid   |
| P1 writes 10 to A1                                                          | A1 = 10 (modified) | B = invalid   |
| P1 reads A1 (RH)                                                            | A1 = 10 (modified) | B = invalid   |
| $\frac{1}{2}$ reads $1$                                                     | · · ·              |               |

| Example<br>Assume: A1 and A2 map to same cache block B, initial cache state is invalid |                    |                  |  |  |
|----------------------------------------------------------------------------------------|--------------------|------------------|--|--|
|                                                                                        |                    |                  |  |  |
| Event                                                                                  | III PT S Cache     | III PZ S Cacile  |  |  |
|                                                                                        | B = invalid        | B = invalid      |  |  |
| P1 writes 10 to A1                                                                     | A1 = 10 (modified) | B = invalid      |  |  |
| '1 reads A1 (RH)                                                                       | A1 = 10 (modified) | B = invalid      |  |  |
| 2 reads A1 (RM)                                                                        | A1 = 10 (shared)   | A1 = 10 (shared) |  |  |
|                                                                                        | A1 = 10 (shared)   | A1 = 10 (shared) |  |  |



| Example<br>Assume: A1 and A2 map to same cache block B, initial cache state is invalid |                    |                    |  |  |
|----------------------------------------------------------------------------------------|--------------------|--------------------|--|--|
|                                                                                        |                    |                    |  |  |
|                                                                                        | B = invalid        | B = invalid        |  |  |
| P1 writes 10 to A1                                                                     | A1 = 10 (modified) | B = invalid        |  |  |
| P1 reads A1 (RH)                                                                       | A1 = 10 (modified) | B = invalid        |  |  |
| P2 reads A1 (RM)                                                                       | A1 = 10 (shared)   | A1 = 10 (shared)   |  |  |
| P2 write 20 to A1 (WH)                                                                 | B = invalid        | A1 = 20 (modified) |  |  |
|                                                                                        |                    |                    |  |  |



| Example<br>Assume: A1 and A2 map to same cache block B, initial cache state is invalid |                    |                    |  |
|----------------------------------------------------------------------------------------|--------------------|--------------------|--|
|                                                                                        |                    |                    |  |
| Eveill                                                                                 |                    | III FZ 5 Cacile    |  |
|                                                                                        | B = invalid        | B = invalid        |  |
| P1 writes 10 to A1                                                                     | A1 = 10 (modified) | B = invalid        |  |
| P1 reads A1 (RH)                                                                       | A1 = 10 (modified) | B = invalid        |  |
| P2 reads A1 (RM)                                                                       | A1 = 10 (shared)   | A1 = 10 (shared)   |  |
| P2 write 20 to A1 (WH)                                                                 | B = invalid        | A1 = 20 (modified) |  |
| P2 writes 40 to A2 (WM                                                                 | )                  | . , ,              |  |





