Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

2Shared memory multiprocessor (SMP)

Recall that in our multicore processor architecture we assume a shared memory model to enable multithreaded processing. This model is called a shared memory multiprocessor (SMP), which assume a single physical address space across all processors.[1]

Given our understanding of the memory hierarchy and Jim Gray’s space-time analogy of locality, memory is a performance bottleneck even with one processor. Shared memory multiprocessors use caches to reduce bandwidths on main memory, as shown in Figure 1.

"Shared memory multiprocessor diagram: several cores or sockets connect through a bus to a unified physical memory and optionally a shared last-level cache. One physical address space visible to all processors."

Figure 1:Shared-memory Multiprocessor (SMP) with multiple cores and a single, coherent memory.

Solution to Exercise 1 #

True For all other instructions, we don’t need to read the data that is read out from DMEM, and thus don’t need to wait for the output of the MEM stage.

Notes about Figure 1:

3Cache Coherence Problem

In a different section, we discuss how threads running on multiple processors can use locks to synchronize access to shared data across processors. In this section, we discuss an additional problem that arises when we introduce caching: cache coherence.

Consider three example memory accesses on a dual-core system. Assume the word 20 is initially in memory @ address 0x5000 , and we perform three memory accesses:

  1. CPU 1 reads word @ address 0x5000.

  2. CPU 2 reads word @ address 0x5000.

  3. CPU 1 writes word 40 @ address 0x5000

Figure 2 shows that accesses 1 and 2, which are reads, trigger compulsory cache misses in both CPU 1’s and CPU 2’s caches. The two caches must request the corresponding block from memory, via the communication bus. Each processor gets a copy of this block (and therefore a copy of the word @ address 0x5000) and stores the block on their own cache.

CPU 2 reads Mem[0x5000]

"Visual description of a dual-core SMP system. The SMP system diagram illustrates several cores or sockets connect through a shared bus to a unified physical memory and to I/O. There are four directed arrows between each CPU's cache and the shared memory unit to show the bus access for the two memory accesses: (1) CPU 1 **reads** word @ address `0x5000`, and (2) CPU 2 **reads** word @ address `0x5000`. One set of arrows is labeled request (cache via bus to memory); the other set is labeled response (memory via bus to cache). CPU Each CPU cache has a copy of the word `20` at memory address `0x5000`.

Figure 2:CPU 1 and CPU 2 both read a word @ address 0x5000. If both caches are cold, these two memory accesses are compulsory cache misses, and the value must be retrieved from shared memory via the shared bus.

The issue is revealed with Figure 3, which illustrates access 3, which is a write. When CPU 1 performed a write, CPU 1’s cache was up-to-date, but CPU 2’s cache is now stale, and it doesn’t know.

"Dual-core SMP system, continued. Next, CPU 1 performs a memory write to word `40` @ address `0x5000`. Now, CPU 1's cache has a copy of the word `40` at memory address `0x5000`, but CPU 2's cache still has the word `20` at memory address `0x5000`."

Figure 3:CPU 1 performs a memory write to word 40 @ address 0x5000. In a non-cache coherent system, CPU 1 and CPU 2 now have different copies of the same region of memory.

The last access in our example illustrates that this system is not cache coherent. From Wikipedia:

In a cache coherent system, if multiple clients have a cached copy of the same region of a shared memory resource, all copies are the same.

P&H defines cache coherence as the aspect that defines what values can be returned by a read. There must be a way of enforcing the “coherency” implied by the phrase: “all copies are the same.” We do so using an additional type of cache miss.

4Coherence Miss

To enforce cache coherence, we introduce a fourth type of cache miss: a coherence miss, e.g., a communication miss caused by writes to shared data made by other processors.

Such misses are commonly part of cache coherence protocols, which are means of maintaining coherence for multiple processors. For example, a protocol can ensure that a processor has “exclusive access” to a data item by invalidating copies in other caches on a write. Subsequently, a processor that reads (or writes) to an invalidated copy then misses in the cache; this miss is categorized as a coherence miss.

5Snooping Protocols

One version of the write invalidate cache coherence protocol described above is a snooping protocol. When any processor accesses memory, use the bus to “snoop”[2] and notify other processors.

Each cache controller “snoops” for write transactions on the common bus. On another processor’s block request to the bus, check if one’s own cache has a copy.

This snooping protocol permits many processors to have copies of data that are only read, and permits a processor that is writing to have an exclusive copy of the data (because other copies are invalidated).

5.1Details

MOESI is a full cache coherence protocol that describes the states in other cache protocols: Modified Owned Exclusive Shared Invalid.

For each block in a cache, track state:

Two enhancements:

UC Berkeley has explored various snooping[3] protocols; see an advanced computer architecture course for more information.

Footnotes
  1. Given the shared address space, a more accurate term for shared memory multiprocessor might be shared-address multiprocessor. You may also see the term symmetric multiprocessor, but we digress.

  2. From Merriam-Webster: to look or pry especially in a sneaking or meddlesome manner.

  3. Sometimes you will see snooping protocols called Snoopy Protocols and snooping buses called Snoopy Buses, like the Peanuts character. source