On domenica 23 luglio 2023 13:56:38 CEST Fabio M. De Francesco wrote: > Extend page_tables.rst by adding a small introductive section about > the role of MMU and TLB in translating between virtual addresses and > physical page frames. Furthermore explain the concepts behind the > Page Faults exceptions and how Linux handles them. Please discard this RFC because I sent it by mistake. The real RFC is "[RFC PATCH v2] Documentation/page_tables: Add info about MMU/ TLB and Page Faults" at https://lore.kernel.org/lkml/20230723120721.7139-1-fmdefrancesco@xxxxxxxxx/ Sorry for the noise. Fabio > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Bagas Sanjaya <bagasdotme@xxxxxxxxx> > Cc: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > Cc: Jonathan Corbet <corbet@xxxxxxx> > Cc: Linus Walleij <linus.walleij@xxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Mike Rapoport <rppt@xxxxxxxxxx> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> > Signed-off-by: Fabio M. De Francesco <fmdefrancesco@xxxxxxxxx> > --- > > v1->v2: Add further information about lower level functions in the page > fault handler and add information about how and why to disable / enable > the page fault handler (provided a link to a Ira's patch that make use > of pagefault_disable() to prevent deadlocks. > > This is an RFC PATCH because of two reasons: > > 1) I've heard that there is consensus about the need to revise and > extend the MM documentation, but I'm not sure about whether or not > developers need these kind of introductory information. > > 2) While preparing this little patch I decided to take a quicj look at > the code and found out it currently is not how I thought I remembered > it. I'm especially speaking about the x86 case. I'm not sure that I've > been able to properly understand what I described as a difference in > workflow compared to most of the other architecture. > > Therefore, for the two reasons explained above, I'd like to hear from > people actively involved in MM. If this is not what you want, feel free > to throw it away. Otherwise I'd be happy to write more on this and other > MM topics. I'm looking forward for comments on this small work. > > Documentation/mm/page_tables.rst | 61 ++++++++++++++++++++++++++++++++ > 1 file changed, 61 insertions(+) > > diff --git a/Documentation/mm/page_tables.rst > b/Documentation/mm/page_tables.rst index 7840c1891751..fa617894fda8 100644 > --- a/Documentation/mm/page_tables.rst > +++ b/Documentation/mm/page_tables.rst > @@ -152,3 +152,64 @@ Page table handling code that wishes to be > architecture-neutral, such as the virtual memory manager, will need to be > written so that it traverses all of the currently five levels. This style > should also be preferred for > architecture-specific code, so as to be robust to future changes. > + > + > +MMU, TLB, and Page Faults > +========================= > + > +The Memory Management Unit (MMU) is a hardware component that handles virtual > to +physical address translations. It uses a relatively small cache in > hardware +called the Translation Lookaside Buffer (TLB) to speed up these > translations. +When a process wants to access a memory location, the CPU > provides a virtual +address to the MMU, which then uses the TLB to quickly > find the corresponding +physical address. > + > +However, sometimes the MMU can't find a valid translation in the TLB. This > +could be because the process is trying to access a range of memory that it's > not +allowed to, or because the memory hasn't been loaded into RAM yet. When > this +happens, the MMU triggers a page fault, which is a type of interrupt > that +signals the CPU to pause the current process and run a special function > to +handle the fault. > + > +One cause of page faults is due to bugs (or maliciously crafted addresses) > and +happens when a process tries to access a range of memory that it doesn't > have +permission to. This could be because the memory is reserved for the > kernel or +for another process, or because the process is trying to write to > a read-only +section of memory. When this happens, the kernel sends a > Segmentation Fault +(SIGSEGV) signal to the process, which usually causes the > process to terminate. + > +An expected and more common cause of page faults is "lazy allocation". This > is +a technique used by the Kernel to improve memory efficiency and reduce > +footprint. Instead of allocating physical memory to a process as soon as > it's +requested, the kernel waits until the process actually tries to use the > memory. +This can save a significant amount of memory in cases where a > process requests +a large block but only uses a small portion of it. > + > +A related technique is "Copy-on-Write" (COW), where the Kernel allows > multiple +processes to share the same physical memory as long as they're only > reading +from it. If a process tries to write to the shared memory, the > kernel triggers +a page fault and allocates a separate copy of the memory for > the process. This +allows the kernel to save memory and avoid unnecessary > data copying and, by +doing so, it reduces latency. > + > +Now, let's see how the Linux kernel handles these page faults: > + > +1. For most architectures, `do_page_fault()` is the primary interrupt handler > + for page faults. It delegates the actual handling of the page fault to + > `handle_mm_fault()`. This function checks the cause of the page fault and + > takes the appropriate action, such as loading the required page into + > memory, granting the process the necessary permissions, or sending a + > SIGSEGV signal to the process. > + > +2. In the specific case of the x86 architecture, the interrupt handler is > + defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls > + `handle_page_fault()`. This function then calls either > + `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether > + the fault occurred in user space or kernel space. Both of these functions > + eventually lead to `handle_mm_fault()`, similar to the workflow in other > + architectures. > + > +The actual implementation of the workflow is very complex. Its design allows > +Linux to handle page faults in a way that is tailored to the specific > +characteristics of each architecture, while still sharing a common overall > +structure. > -- > 2.41.0