Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

Consider the address translation discussed in an earlier section. If we have a 1-MiB page table and a 128-KiB L1 cache, the page table must be stored in memory, not in the cache.

2Page Table Walks

At present, we must perform a page table walk, meaning we must access the page table to get the physical page number for address translation.[1] Remember that the current process’s page table is located main memory. Because page tables are located in memory (Figure 1, then we must access main memory twice. This takes several hundred cycles!

In address translation, there can be two access to memory on every load or store instruction.

Figure 1:In address translation, there can be two access to memory on every load or store instruction.

To minimize performance penalty, we have two options for speed-up that leverage the cache:

  1. Address translation: Use a cache for frequently or recently used page table entries.

  2. Data access: Copy blocks from main memory to the cache.

To address the latter, see our unit on caches and our next section. To address the former, in this section we introduce the translation lookaside buffer.

3The Translation Lookaside Buffer

The translation lookaside buffer (TLB), or translation buffer, caches address translations, i.e., VPN-PPN mappings. It is usually separate hardware from memory caches and (like memory caches) is stored close to the CPU.

As shown in Figure 2, the TLB stores a subset of address translations for the current process’s page table. The TLB leverages locality and stores recently accessed translations.

The Translation Lookaside Buffer (TLB) stores a subset of address translations.

Figure 2:The Translation Lookaside Buffer (TLB) stores a subset of address translations.

The TLB Reach is the number of virtual addresses can get immediately translated by the TLB. In other words, it is the size of the largest possible (disjoint) virtual address space that can be determined by the given entries in the TLB:

TLB reach=# TLB entries×Page size\text{TLB reach} = \text{\# TLB entries} \times \text{Page size}

If the TLB “hits,” then no page table walk occurs, meaning we avoid accessing memory on address translation. Common TLB design:

4Address Translation with the TLB

We have now introduced one type of “cache” into our virtual memory system: the TLB. This efficiency speeds up address translation, which is the first of the two accesses to the memory hierarchy.

Let us focus on the performance of the address translation by considering the toy scenario in Figure 3. Note that there is no memory cache, i.e., all data accesses must go to memory.

Memory hierarchy layout for this scenario. The page tables and a subset of pages for both processes (firefox and intellij) are stored in memory. Other pages are on disk. There is a TLB. There is no memory cache.

Figure 3:Memory hierarchy layout for this scenario. The page tables and a subset of pages for both processes (firefox and intellij) are stored in memory. Other pages are on disk. There is a TLB. There is no memory cache.

Firefox, the currently active process, requests data @ address 0x00004ABC:

Let us compare three cases for translating the requested virtual address.[3] Toggle the tabs.

Case 1: TLB Hit
Case 2: TLB Miss, Page in Memory
Case 3: Page Fault
  1. The requested VPN is in the TLB (e.g., it was recently accessed), so we retrieve the PPN from the TLB entry and translating the resulting physical address.

Address translation accesses just the TLB and is close to instant, on the order of a clock cycle.

Case 1 is the best-case scenario: A TLB hit. Because the corresponding physical page is available in the TLB, no memory access is needed for address translation.

Figure 4:Case 1 is the best-case scenario: A TLB hit. Because the corresponding physical page is available in the TLB, no memory access is needed for address translation.

Table 1:Three address translation cases.

CasePerformanceTLBPage Table (in Memory)Disk
1Best (~1 cycle, TLB)Hit ✅Not visitedNot visited
2Worse (~100 cycles, memory)Miss ❌Hit (Page Table Entry Valid) ✅Not visited
3Worst (~1000 cycles, disk)Miss ❌Miss (Page Fault) ❌Visited ✅

Finally, let’s put it all together by including memory caches to speed up data access. Let’s go!

Footnotes
  1. The “walk” terminology makes more sense with hierarchical page tables, where multiple levels of page tables are accessed on each address translation. Hierarchical page tables are out of scope for this course.

  2. In this course, we will assume that the TLB is fully associative. However, in practice, some TLB designs are set associative. In these cases, a Virtual Page Number is split into a TLB tag and a TLB index: the latter is used to determine the index of set; the former is used to determine a matching way within the set. This low-associativity TLB design can support other optimizations in address translation; see P&H Computer Architecture and later courses for details.

  3. The TLB in our scenario keeps track of the PID corresponding to each TLB entry. In Figure 4 and related figures, the grayed out entry has the PID of intellij; other entries all have the PID of firefox (the currently running process).

  4. Imagine that during the context switch, the other process does not update any pages in the TLB. This is unlikely, but our toy scenario is contrived for simplicity.