On Thu, Nov 14, 2024 at 10:12:00PM +0100, Jann Horn wrote: > Make it clearer that holding the mmap lock in read mode is not enough > to traverse page tables, and that just having a stable VMA is not enough > to read PTEs. > > Suggested-by: Matteo Rizzo <matteorizzo@xxxxxxxxxx> > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx> Have some queries before we move forward so would like a little more clarification/perhaps putting some extra meat on the bones first. Broadly very glad you have done this however so it's just sorting details first! :>) > --- > @akpm: Please don't put this in your tree before Lorenzo has replied. > > @Lorenzo: > This is intended to go on top of your documentation patch. > If you think this is a sensible change, do you prefer to squash it into > your patch or do you prefer having akpm take this as a separate patch? > IDK what works better... I think a new patch is better, as I'd like the original to settle down now and the whole point of this doc is that it's a living thing that many people can contribute to, update, etc. For instance, Suren is updating as part of one of his series to correct things that he changes in that series, which is really nice. > --- > Documentation/mm/process_addrs.rst | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst > index 1bf7ad010fc063d003bb857bb3b695a3eafa0b55..9bdf073d0c3ebea1707812508a309aa4a6163660 100644 > --- a/Documentation/mm/process_addrs.rst > +++ b/Documentation/mm/process_addrs.rst > @@ -339,6 +339,16 @@ When **installing** page table entries, the mmap or VMA lock must be held to > keep the VMA stable. We explore why this is in the page table locking details > section below. > > +.. warning:: Taking the mmap lock in read mode **is not sufficient** for > + traversing page tables; you must also ensure that a VMA exists that > + covers the range being accessed. Hm, but we say later we don't need _any_ locks for traversal, and here we say we need mmap read lock. Do you mean installing page table entries? Or do you mean to say, that if you don't span a VMA, you must acquire a write lock at least to preclude this? This seems quite unclear. I kind of didn't want to touch on the horrors of fiddling about without a VMA, so I'd rather this very clearly say something like 'it is unusual to manipulate page tables wihch are not spanned by a VMA, and there are special requirements for this operation' etc. et.c otherwise this just adds more noise and confusion I think. > + This ensures you can't race with concurrent page table removal > + which happens with the mmap lock in read mode, in regions whose > + VMAs are no longer present in the VMA tree. > + > + (Alternatively, the mmap lock can be taken in write mode, but that > + is heavy-handed and almost never the right choice.) You kind of need to expand on why that is I think! > + > **Freeing** page tables is an entirely internal memory management operation and > has special requirements (see the page freeing section below for more details). > > @@ -450,6 +460,9 @@ the time of writing of this document. > Locking Implementation Details > ------------------------------ > > +.. warning:: Locking rules for PTE-level page tables are very different from > + locking rules for page tables at other levels. > + > Page table locking details > -------------------------- > > @@ -470,8 +483,12 @@ additional locks dedicated to page tables: > These locks represent the minimum required to interact with each page table > level, but there are further requirements. > > -Importantly, note that on a **traversal** of page tables, no such locks are > -taken. Whether care is taken on reading the page table entries depends on the > +Importantly, note that on a **traversal** of page tables, sometimes no such > +locks are taken. However, at the PTE level, at least concurrent page table > +deletion must be prevented (using RCU) and the page table must be mapped into > +high memory, see below. Ugh I really do hate that we have to think about high memory. I'd like to sort of deny it exists. But I suppose that's not an option. As for the RCU thing, I guess this is why pte_offset_map_lock() is taking it? Maybe worth mentioning something there or updating that 'interestingly' block... :>) Or am I mistaken? I wasn't aware of this requirement, is this sort of implied by the gup_fast() IRQ disabling stuff? Please expand :) > + > +Whether care is taken on reading the page table entries depends on the > architecture, see the section on atomicity below. > > Locking rules > > --- > base-commit: 1e96a63d3022403e06cdda0213c7849b05973cd5 > change-id: 20241114-vma-docs-addition1-onv3-32df4e6dffcf > > -- > Jann Horn <jannh@xxxxxxxxxx> > Thanks for this, your input is hugely appreciated both in the review and now this, you're a gem! Cheers, Lorenzo