On 04.03.24 20:19, Yang Shi wrote:
On Mon, Mar 4, 2024 at 8:54 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
I looked at the definition of is_huge_zero_page():
static inline bool is_huge_zero_page(struct page *page)
{
return READ_ONCE(huge_zero_page) == page;
}
That made me raise my eyebrows a bit because it will return false for
tail pages of the HZP (that was at least unexpected for me). Then we
have this beauty:
void free_page_and_swap_cache(struct page *page)
{
struct folio *folio = page_folio(page);
free_swap_cache(folio);
if (!is_huge_zero_page(page))
folio_put(folio);
}
So if we can call free_page_and_swap_cache() with a tail of the HZP
we can absolutely screw up its refcounting. Now, we have VM_BUGs
to catch the refcount going below 0, and I haven't seen them being
hit, so I _presume_ it doesn't happen, but maybe somebody inventive
could come up with a way of putting a HZP tail into a page table ...?
The huge zero pmd split is specially handled by
__split_huge_zero_page_pmd(), which actually replaces every subpages
of HZP to zero page.
Right.
The only thing that can happen is that we GUP a part of the huge
zeropage (FOLL_PIN only, FOLL_LONGTERM/FOLL_WRITE would trigger a fault
first and map us an anon folio), and unpinning would drop these references.
unpin_user_page()->gup_put_folio()->folio_put_refs() would call
__folio_put().
Not sure if __folio_put() does the right thing, but I hope so :) Did not
look into the details.
In folios_put_refs() we do have is_huge_zero_page() special handling, I
guess that is for ordinary zap/unmap and likely the right thing to do.
Looks a bit inconsistent. (folio_put_refs() vs. folios_put_refs())
Likely, we should also not perform any refcounting on the huge zeropage
in GUP, just like we do for the ordinary zeropage nowdays. [ccing Dave]
--
Cheers,
David / dhildenb