Re: [RFC PATCH] docs/mm: add VMA locks documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 5, 2024 at 5:10 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
> On Mon, Nov 04, 2024 at 10:29:47PM +0100, Jann Horn wrote:
> > On Mon, Nov 4, 2024 at 5:42 PM Lorenzo Stoakes
> > <lorenzo.stoakes@xxxxxxxxxx> wrote:
> > > On Sat, Nov 02, 2024 at 02:45:35AM +0100, Jann Horn wrote:
> > > > On Fri, Nov 1, 2024 at 7:50 PM Lorenzo Stoakes
> > > > <lorenzo.stoakes@xxxxxxxxxx> wrote:
> > > > 3. PMD entries that point to page tables can be changed while holding
> > > > the page table spinlocks for the entry and the table it points to
> > >
> > > Hm wut? When you say 'entry' what do you mean? Obviously a page table in
> >
> > By "PMD entry" I mean a pmd_t (located in a Page Middle Directory),
> > and by "that point to page tables" I mean "that point to a PTE-level
> > page table".
> >
> > In other words, from the reader perspective (as I think you already
> > mentioned further down):
> >
> > Rule 2 means: From the perspective of a reader who is holding the VMA
> > lock in read mode, once you have seen that e.g. a PUD entry
> > (overlapping the VMA's virtual address region) points to a PMD page
> > table, you know that this PUD entry will keep pointing to that PMD
> > table.
> >
> > Rule 3 means: From the perspective of a reader who is holding the VMA
> > lock in read mode, once you have seen that a PMD entry (overlapping
> > the VMA's virtual address region) points to a page table, you don't
> > know whether this PMD entry will keep pointing to the same page table
> > unless you're also holding a spinlock on either the PMD or the page
> > table (because khugepaged).
>
> Thanks right I see what you mean.
>
> Might be worth having an explicit THP (thus khugepaged) section? And
> perhaps even KSM...

Maybe, yeah - I think it's important to roughly know what they do, but
I would still focus on what rules other parts of MM, or users of MM,
have to follow to not break in their interaction with things like THP
and KSM. So maybe kinda do it in the direction of "these are the rules
(and here is the detail of why we have those arbitrary-looking
rules)"?

> > But you're right, I was being imprecise - as you pointed out, it's not
> > just used for zapping. Maybe the right version of 6 is something like:
> >
> >     Leaf entries that are not in "none" state can
> >     be changed while holding any one of [...].
> >
> > Though I'm not sure if that is overly broad - I think in practice the
> > changes we make under the rmap locks are something like the following,
> > though that might be missing some:
> >
> >  - zapping leaf entries
> >  - zapping PMD entries pointing to page tables
> >  - clearing A/D bits
> >  - migration
> >
> > > OK so interestingly this really aligns with what Alice said as to this not
> > > giving a clear indicator from a user's perspective as to 'what lock do I
> > > need to hold'.
> > >
> > > So I will absolutely address all this and try to get the fundamentals
> > > boiled down.
> > >
> > > Also obviously the exception to your rules are - _freeing_ of higher level
> > > page tables because we assume we are in a state where nothing can access
> > > them so no such locks are required. But I cover that below.
> > >
> > > >
> > > > And then the rules for readers mostly follow from that:
> > > > 1 => holding the appropriate page table lock makes the contents of a
> > > > page table stable, except for A/D updates
> > > > 2 => page table entries higher than PMD level that point to lower page
> > > > tables can be followed without taking page table locks
> > >
> > > Yeah this is true actually, might be worth mentioning page table walkers
> > > here and how they operate as they're instructive on page table locking
> > > requirements.
> > >
> > > > 3+4 => following PMD entries pointing to page tables requires careful
> > > > locking, and pte_offset_map_lock() does that for you
> > >
> > > Well, pte_offset_map_lock() is obtained at the PTE level right?
> >
> > pte_offset_map_lock() is given a pointer to a PMD entry, and it
> > follows the PMD entry to a PTE-level page table. My point here is that
> > you can't just simply start a "lock this PTE-level page table"
> > operation at the PTE level because by the time you've locked the page
> > table, the PMD entry may have changed, and the page table you just
> > locked may be empty and doomed to be deleted after RCU delay. So you
> > have to use __pte_offset_map_lock(), which takes a pointer to a PMD
> > entry, and in a loop, looks up the page table from the PMD entry,
> > locks the referenced page table, rechecks that the PMD entry still
> > points to the locked page table, and if not, retries all these steps
> > until it manages to lock a stable page table.
>
> Right yeah, I mean this is kind of a standard pattern in the kernel though
> like:
>
> 1. Grab some pointer to something
> 2. Lock
> 3. Really make sure it hasn't disappeared from under us
> 4. If so, unlock and try again
> 5. Otherwise proceed
>
> You have this pattern with folios too...

Yeah, I agree the pattern you need for the access is not that weird,
it's just weird that you need it for page tables at one specific
level.

> > > pmd_lock() at the PMD level (pud_lock() ostensibly at PUD level but this
> > > amounts to an mm->page_table_lock anyway there)
> >
> > > > I think something like
> > > > https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#overview-documentation-comments
> > > > is supposed to let you include the current version of the comment into
> > > > the rendered documentation HTML without having to manually keep things
> > > > in sync. I've never used that myself, but there are a bunch of
> > > > examples in the tree; for example, grep for "DMA fences overview".
> > >
> > > Ah, but this isn't a kernel doc is just a raw comment :) so I'm not sure there
> > > is a great way of grabbing just that, reliably. Maybe can turn that into a
> > > kernel doc comment in a follow up patch or something?
> >
> > Ah, yeah, sounds reasonable.
>
> Thanks.
>
>
> I think all this makes me think that we should actually have entirely
> separate top level descriptions and internals sections in this document,
> which align's again with Alice's comments.
>
> As the level of detail and caveats here mean that if you provide
> implementation details everywhere you end up constantly on a tangent
> (important, relevant internal details but to a _user_ of the functionality
> not so important).

Hmm, yeah.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux