+cc Suren, linux-doc who I mistakenly didn't cc in first email! On Mon, Nov 04, 2024 at 03:47:56PM +0100, Alice Ryhl wrote: > On Fri, Nov 1, 2024 at 7:50 PM Lorenzo Stoakes > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > Locking around VMAs is complicated and confusing. While we have a number of > > disparate comments scattered around the place, we seem to be reaching a > > level of complexity that justifies a serious effort at clearly documenting > > how locks are expected to be interacted with when it comes to interacting > > with mm_struct and vm_area_struct objects. > > > > This is especially pertinent as regards efforts to find sensible > > abstractions for these fundamental objects within the kernel rust > > abstraction whose compiler strictly requires some means of expressing these > > rules (and through this expression can help self-document these > > requirements as well as enforce them which is an exciting concept). > > > > The document limits scope to mmap and VMA locks and those that are > > immediately adjacent and relevant to them - so additionally covers page > > table locking as this is so very closely tied to VMA operations (and relies > > upon us handling these correctly). > > > > The document tries to cover some of the nastier and more confusing edge > > cases and concerns especially around lock ordering and page table teardown. > > > > The document also provides some VMA lock internals, which are up to date > > and inclusive of recent changes to recent sequence number changes. > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > [...] > > > +Page table locks > > +---------------- > > + > > +When allocating a P4D, PUD or PMD and setting the relevant entry in the above > > +PGD, P4D or PUD, the `mm->page_table_lock` is acquired to do so. This is > > +acquired in `__p4d_alloc()`, `__pud_alloc()` and `__pmd_alloc()` respectively. > > + > > +.. note:: > > + `__pmd_alloc()` actually invokes `pud_lock()` and `pud_lockptr()` in turn, > > + however at the time of writing it ultimately references the > > + `mm->page_table_lock`. > > + > > +Allocating a PTE will either use the `mm->page_table_lock` or, if > > +`USE_SPLIT_PMD_PTLOCKS` is defined, used a lock embedded in the PMD physical > > +page metadata in the form of a `struct ptdesc`, acquired by `pmd_ptdesc()` > > +called from `pmd_lock()` and ultimately `__pte_alloc()`. > > + > > +Finally, modifying the contents of the PTE has special treatment, as this is a > > +lock that we must acquire whenever we want stable and exclusive access to > > +entries pointing to data pages within a PTE, especially when we wish to modify > > +them. > > + > > +This is performed via `pte_offset_map_lock()` which carefully checks to ensure > > +that the PTE hasn't changed from under us, ultimately invoking `pte_lockptr()` > > +to obtain a spin lock at PTE granularity contained within the `struct ptdesc` > > +associated with the physical PTE page. The lock must be released via > > +`pte_unmap_unlock()`. > > + > > +.. note:: > > + There are some variants on this, such as `pte_offset_map_rw_nolock()` when we > > + know we hold the PTE stable but for brevity we do not explore this. > > + See the comment for `__pte_offset_map_lock()` for more details. > > + > > +When modifying data in ranges we typically only wish to allocate higher page > > +tables as necessary, using these locks to avoid races or overwriting anything, > > +and set/clear data at the PTE level as required (for instance when page faulting > > +or zapping). > > Speaking as someone who doesn't know the internals at all ... this > section doesn't really answer any questions I have about the page > table. It looks like this could use an initial section about basic > usage, and the detailed information could come after? Concretely, if I > wish to call vm_insert_page or zap some pages, what are the locking > requirements? What if I'm writing a page fault handler? Ack totally agree, I think we need this document to serve two purposes - one is to go over, in detail, the locking requirements from an mm dev's point of view with internals focus, and secondly to give those outside mm this kind of information. It's good to get insight from an outside perspective as inevitably we mm devs lose sight of the wood for the trees when it comes to internals vs. practical needs of those who make use of mm in one respect or another. So this kind of feedback is very helpful and welcome :) TL;DR - yes I will explicitly state what is required for various operations on the respin. > > Alice As a wordy aside, a large part of the motivation of this document, or certainly my prioritisation of it, is explicitly to help the rust team correctly abstract this aspect of mm. The other part is to help the mm team, that is especailly myself, correctly understand and _remember_ the numerous painful ins and outs of this stuff, much of which has been pertinent of late for not wonderfully positive reasons. Hopefully we accomplish both! :>)