Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

We are finally, finally ready to consider access to the full memory hierarchy. How does this memory hierarchy improve performance?

The full memory hierarchy. Now, we consider access to memory caches, main memory (primary storage), and disk (secondary storage).

Figure 1:The full memory hierarchy. Now, we consider access to memory caches, main memory (primary storage), and disk (secondary storage).

2Virtual Memory and Caches

Virtual memory and caches are tricky to work with together because they do not use the same terminology, nor are they implemented similarly. Virtual memory was a historic concept created out of the limitations of tiny memory; the concept came before caches, which were built to make performance fast. Nevertheless, they are both layers in the memory hierarchy.

To compare virtual memory to caches, let us consider a horizontal view of the memory hierarchy shown in Figure 2.

Layers of the hierarchy on the left are closer to the CPU and access is faster; as we move right, modules get further from the CPU and access is much slower.

Figure 2:Layers of the hierarchy on the left are closer to the CPU and access is faster; as we move right, modules get further from the CPU and access is much slower.

Terminology: Memory units. Blocks, pages, words, bytes are all units in the memory hierarchy. In Figure 2 above:

Copies of data. Remember what we learned when we first explored other layers of the memory hierarchy in a previous section:

The 🙂 data in the L1 cache is also available in a block of the L2 cache, on a physical page of main memory, and on a disk page.

Figure 3:The 🙂 data in the L1 cache is also available in a block of the L2 cache, on a physical page of main memory, and on a disk page.

Recall from an earlier section, page tables do not have data; they help translate addresses and facilitate demand paging.

With demand paging, pages on disk are loaded into memory only when needed by the process. The location of each page is tracked by status bits the page table—here, the valid bit in each page table entry, combined with the physical page number (if the valid bit is set).

Figure 4:With demand paging, pages on disk are loaded into memory only when needed by the process. The location of each page is tracked by status bits the page table—here, the valid bit in each page table entry, combined with the physical page number (if the valid bit is set).

The full comparison of these two parts of the memory hierarchy is in Table 1.

Table 1:Terminology of virtual memory vs. caches.

FeatureCachesVirtual Memory
In memory hierarchyCaches ↔ MemoryMemory ↔ Disk
Memory unitLine or Block (~64 bytes)Page (~4096 bytes)
MissCache MissPage Fault
AssociativityDirect-mapped, N-way set associative, fully associative“Fully associative” (location determined by OS)
Replacement policyLeast-recently-used (LRU) or randomLRU (most common), FIFO, or random
Write policyWrite-through or write-backWrite-back

Table 1 above does not include terminology for the TLB, which is our cache for address translations. To differentiate misses in the TLB from those in memory caches, we use the terminology: TLB hit, TLB miss.

3Physically Indexed, Physically Tagged Caches

We would love to put everything together: memory caches, TLB, page tables, disks—you name it! But what order do we put things in? Consider the following questions as you look at Figure 5:

Putting it all together: what is the order in which we access things? This scenario assumes one layer of caches.

Figure 5:Putting it all together: what is the order in which we access things? This scenario assumes one layer of caches.

Other more complicated designs exist, but in this class we assume physically indexed, physically tagged (PIPT) caches. In Figure 6, the fully associative TLB uses the virtual page number to construct its tag, [1]. By contrast, the memory cache uses the physical address to construct its tag and index, and the cache is indexed via this physical index. Note that page offset and block offsets are different sizes, because blocks are smaller than pages.

Physically indexed, physically tagged caches.

Figure 6:Physically indexed, physically tagged caches.

A PIPT cache design determines the order of access as shown in Figure 7:

  1. Address translation: First translate virtual address to physical address.

  2. Data access: Use the physical address to access the data in the memory hierarchy.

With PIPT caches, address translation happens first. This memory scenario assumes one layer of caches.

Figure 7:With PIPT caches, address translation happens first. This memory scenario assumes one layer of caches.

4Revisiting AMAT: Impact of Paging

At a high-level, see Table 2.

Table 2:Memory hierarchy metrics (review terminology): Hit rate, hit time, miss rate, and miss penalty. We ignore the TLB for simplicity in computing AMAT, but advanced readers can try incorporating it.

FeatureCacheVirtual Memory
UnitBlock (32-64 B)Page (4-16 KiB)
Miss rate1% to 20%0.001%
Hit time≈ 1 cycle≈ 100 cycles
Miss penalty≈ 100 cycles≈ 5M cycles

Let’s compute AMAT for a specific example. Suppose we have the following parameters (with no TLB):

Note that for a 2 GHz clock, 200 cycles is 100 ns, and 20,000,000 cycles is 10 ms.

Suppose we only have DRAM and never access disk. Average memory access time in this case is AMATnoAMAT_{no} below:

AMATno=1+0.05×(L1 miss penalty)=1+0.05×(10+0.r0×(L2 miss penalty))=1+0.05×10+0.05×0.40×200=5.5 clock cycles\begin{aligned} AMAT_{no} &= 1 + 0.05 \times (\text{L1 miss penalty}) \\ &= 1 + 0.05 \times (10 + 0.r0 \times (\text{L2 miss penalty})) \\ &= 1 + 0.05 \times 10 + 0.05 \times 0.40 \times 200 \\ &= 5.5 \text{ clock cycles} \end{aligned}

With paging, i.e., disk access, define a rate RR that is our “hit rate” to memory. Equivalently, 1R1-R is our probability of a page fault. AMAT becomes:

AMAT=1+0.05×10+0.05×0.40×(200+(1R)×20,000,000)=AMATno+(0.05×0.40×(1R)×20,000,000)\begin{aligned} AMAT &= 1 + 0.05 \times 10 + 0.05 \times 0.40 \times (200 + (1 - R) \times 20,000,000) \\ &= AMAT_{no} + (0.05 \times 0.40 \times (1 - R) \times 20,000,000) \end{aligned}

The second factor is a performance cost proportional to the performance of our demand paging system. We mentioned above in Table 2 that a miss rate for VM is 0.001%. This is our page fault rate, 1R1 - R. 1R1 - R = 0.001% = 0.00001 seems like a tiny rate, but we can see quickly how AMAT explodes with higher rates:

The last of these is about 680 times slower than the first of these. That’s really, really, REALLY slow...!

Given this analysis, we hope we have convinced you how costly page faults are, and why switching to software with the OS to determine page placement and replacement in memory is well worth it.

That wraps up virtual memory and our exploration of the memory hierarchy! Congratulations!!!

Footnotes
  1. In low-associativity TLBs, the virtual page number is split into the TLB tag and the TLB index; then, the TLB is indexed by this “virtual index.”

  2. Often, the physical index and tag comprise the physical page number.

  3. If there are multiple levels of caches, treat them all as part of the “Cache” block in Figure 7 and assume they are all physically indexed, physically tagged. If there is a miss in a higher-level cache, go to a lower level of cache. If the lowest-level cache misses, then go to memory and get a block for the lowest-level cache, then copy the block to the second-lowest-level cache, etc.