The Translation Lookaside Buffer - CS 61C Course Notes

1Learning Outcomes¶

Identify the two accesses to memory hierarchy systems in virtual memory systems: address translation and data access.
Describe the TLB and its use. List reasonable parameters for the TLB.
Define a page table walk.
Compare address translation performance with the TLB.

🎥 Lecture Video

Consider the address translation discussed in an earlier section. If we have a 1-MiB page table and a 128-KiB L1 cache, the page table must be stored in memory, not in the cache.

2Page Table Walks¶

At present, we must perform a page table walk, meaning we must access the page table to get the physical page number for address translation.^[1] Remember that the current process’s page table is located main memory. Because page tables are located in memory (Figure 1, then we must access main memory twice. This takes several hundred cycles!

Figure 1:In address translation, there can be two access to memory on every load or store instruction.

To minimize performance penalty, we have two options for speed-up that leverage the cache:

Address translation: Use a cache for frequently or recently used page table entries.
Data access: Copy blocks from main memory to the cache.

To address the latter, see our unit on caches and our next section. To address the former, in this section we introduce the translation lookaside buffer.

3The Translation Lookaside Buffer¶

The translation lookaside buffer (TLB), or translation buffer, caches address translations, i.e., VPN-PPN mappings. It is usually separate hardware from memory caches and (like memory caches) is stored close to the CPU.

As shown in Figure 2, the TLB stores a subset of address translations for the current process’s page table. The TLB leverages locality and stores recently accessed translations.

Figure 2:The Translation Lookaside Buffer (TLB) stores a subset of address translations.

Show Explanation

True. The TLB is (conceptually) a cache because it stores a subset of information located in a lower level of the memory hierarchy. Here, the information a TLB stores is a subset of VPN-PPN translations in the page table (unlike memory caches, which store a subset of data in main memory).

The TLB Reach is the number of virtual addresses can get immediately translated by the TLB. In other words, it is the size of the largest possible (disjoint) virtual address space that can be determined by the given entries in the TLB:

\text{TLB reach} = \text{\# TLB entries} \times \text{Page size}

If the TLB “hits,” then no page table walk occurs, meaning we avoid accessing memory on address translation. Common TLB design:

38-128 entries
Fully associative, ^[2] which increases TLB reach by minimizing conflicting entries.
FIFO or random replacement policy

One TLB per core

There is just one TLB per core (“the” TLB), but there is one page table per process. Recall that multiple processes can run concurrently via OS context switches. How can the TLB be efficiently used by different processes?

Simple: When the OS performs a context switch to run a different process, the OS flushes all of the TLB by invalidating its entries. This approach keeps the TLB hardware simple. However, after a context switch, the currently running process will need to repopulate the TLB, which will incur misses due to address translation.

More complicated: The TLB could also keep track of the process ID (PID) corresponding to each TLB entry. During address translation, both the process ID and the virtual page number must be checked against the corresponding values in a TLB entry. This complicates hardware. However, fewer TLB misses may be incurred on a process context switch, since the TLB no longer flushed.

4Address Translation with the TLB¶

We have now introduced one type of “cache” into our virtual memory system: the TLB. This efficiency speeds up address translation, which is the first of the two accesses to the memory hierarchy.

Let us focus on the performance of the address translation by considering the toy scenario in Figure 3. Note that there is no memory cache, i.e., all data accesses must go to memory.

Figure 3:Memory hierarchy layout for this scenario. The page tables and a subset of pages for both processes (firefox and intellij) are stored in memory. Other pages are on disk. There is a TLB. There is no memory cache.

Firefox, the currently active process, requests data @ address 0x00004ABC:

The virtual page number (VPN) associated with this virtual address is 4.
The physical page number (PPN) is 0x8C121D.
The physical address of the data is 0x8C121DABC on the page with base address 0x8C121D000.
The data is orange (assume orange fits in, say, a memory word).

Let us compare three cases for translating the requested virtual address.^[3] Toggle the tabs.

Case 1: TLB Hit

Case 2: TLB Miss, Page in Memory

Case 3: Page Fault

The requested VPN is in the TLB (e.g., it was recently accessed), so we retrieve the PPN from the TLB entry and translating the resulting physical address.

Address translation accesses just the TLB and is close to instant, on the order of a clock cycle.

Figure 4:Case 1 is the best-case scenario: A TLB hit. Because the corresponding physical page is available in the TLB, no memory access is needed for address translation.

Table 1:Three address translation cases.

Case	Performance	TLB	Page Table (in Memory)	Disk
1	Best (~1 cycle, TLB)	Hit ✅	Not visited	Not visited
2	Worse (~100 cycles, memory)	Miss ❌	Hit (Page Table Entry Valid) ✅	Not visited
3	Worst (~1000 cycles, disk)	Miss ❌	Miss (Page Fault) ❌	Visited ✅

Show Answer

True. Remember: The TLB caches recent page table entries. If the entry is valid in the TLB, it must also be valid in the page table, and the data must therefore be in memory.

Finally, let’s put it all together by including memory caches to speed up data access. Let’s go!

Footnotes¶

The “walk” terminology makes more sense with hierarchical page tables, where multiple levels of page tables are accessed on each address translation. Hierarchical page tables are out of scope for this course.
↩
In this course, we will assume that the TLB is fully associative. However, in practice, some TLB designs are set associative. In these cases, a Virtual Page Number is split into a TLB tag and a TLB index: the latter is used to determine the index of set; the former is used to determine a matching way within the set. This low-associativity TLB design can support other optimizations in address translation; see P&H Computer Architecture and later courses for details.
↩
The TLB in our scenario keeps track of the PID corresponding to each TLB entry. In Figure 4 and related figures, the grayed out entry has the PID of intellij; other entries all have the PID of firefox (the currently running process).
↩
Imagine that during the context switch, the other process does not update any pages in the TLB. This is unlikely, but our toy scenario is contrived for simplicity.
↩