Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 30 Oct 2022 11:51:48 -0700

On Sun, Oct 30, 2022 at 11:19 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> And we'd _like_ to do the TLB flush before the remove_rmap(), but we
> *really* don't want to do that for every page.

Hmm. I have yet another crazy idea.

We could keep the current placement of the TLB flush, to just before
we drop the page table lock.

And we could do all the things we do in 'page_remove_rmap()' right now
*except* for the mapcount stuff.

And only move the mapcount code to the page freeing stage.

Because all the rmap() walk synchronization really needs is that
'page->_mapcount' is still elevated, and if it is it will serialize
with the page table lock.

And it turns out that 'page_remove_rmap()' already treats the case we
care about differently, and all it does is

        lock_page_memcg(page);

        if (!PageAnon(page)) {
                page_remove_file_rmap(page, compound);
                goto out;
        }
       ...
out:
        unlock_page_memcg(page);

        munlock_vma_page(page, vma, compound);

for that case.

And that 'page_remove_file_rmap()' is literally the code that modifies
the _mapcount.

Annoyingly, this is all complicated by that 'compound' argument, but
that's always false in that zap_page_range() case.

So what we *could* do, is make a new version of page_remove_rmap(),
which is specialized for this case: no 'compound' argument (always
false), and doesn't call 'page_remove_file_rmap()', because we'll do
that for the !PageAnon(page) case later after the TLB flush.

That would keep the existing TLB flush logic, keep the existing 'mark
page dirty' and would just make sure that 'folio_mkclean()' ends up
being serialized with the TLB flush simply because it will take the
page table lock because we delay the '._mapcount' update until
afterwards.

Annoyingly, the organization of 'page_remove_rmap()' is a bit ugly,
and we have several other callers that want the existing logic, so
while the above sounds conceptually simple, I think the patch would be
a bit messy.

                  Linus