Re: Can the huge zero page be partially mapped?

David Hildenbrand <david@xxxxxxxxxx> · Mon, 4 Mar 2024 22:52:44 +0100

On 04.03.24 20:19, Yang Shi wrote:
On Mon, Mar 4, 2024 at 8:54 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:

I looked at the definition of is_huge_zero_page():

static inline bool is_huge_zero_page(struct page *page)
{
         return READ_ONCE(huge_zero_page) == page;
}

That made me raise my eyebrows a bit because it will return false for
tail pages of the HZP (that was at least unexpected for me).  Then we
have this beauty:

void free_page_and_swap_cache(struct page *page)
{
         struct folio *folio = page_folio(page);

         free_swap_cache(folio);
         if (!is_huge_zero_page(page))
                 folio_put(folio);
}

So if we can call free_page_and_swap_cache() with a tail of the HZP
we can absolutely screw up its refcounting.  Now, we have VM_BUGs
to catch the refcount going below 0, and I haven't seen them being
hit, so I _presume_ it doesn't happen, but maybe somebody inventive
could come up with a way of putting a HZP tail into a page table ...?

The huge zero pmd split is specially handled by
__split_huge_zero_page_pmd(), which actually replaces every subpages
of HZP to zero page.

Right.

The only thing that can happen is that we GUP a part of the huge 
zeropage (FOLL_PIN only, FOLL_LONGTERM/FOLL_WRITE would trigger a fault 
first and map us an anon folio), and unpinning would drop these references.

unpin_user_page()->gup_put_folio()->folio_put_refs() would call 
__folio_put().

Not sure if __folio_put() does the right thing, but I hope so :) Did not 
look into the details.

In folios_put_refs() we do have is_huge_zero_page() special handling, I 
guess that is for ordinary zap/unmap and likely the right thing to do.

Looks a bit inconsistent. (folio_put_refs() vs. folios_put_refs())

Likely, we should also not perform any refcounting on the huge zeropage 
in GUP, just like we do for the ordinary zeropage nowdays. [ccing Dave]

--
Cheers,

David / dhildenb