On 2024/3/16 3:22, Jane Chu wrote: > On 3/15/2024 1:32 AM, Miaohe Lin wrote: > >> On 2024/3/13 9:23, Jane Chu wrote: >>> On 3/12/2024 7:14 AM, Matthew Wilcox wrote: >>> >>>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote: >>>>> On 2024/3/11 20:31, Matthew Wilcox wrote: >>>>>> Assuming we have a refcount on this page so it can't be simultaneously >>>>>> split/freed/whatever, these three sequences are equivalent: >>>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent. >>>>> >>>>>> 1 if (PageCompound(p)) >>>>>> >>>>>> 2 struct page *head = compound_head(p); >>>>>> 2 if (PageHead(head)) >>>>>> >>>>>> 3 struct folio *folio = page_folio(p); >>>>>> 3 if (folio_test_large(folio)) >>>>>> >>>>>> . >>>>>> >>>>> But please see below commit: >>>>> >>>>> """ >>>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf >>>>> Author: Andi Kleen <ak@xxxxxxxxxxxxxxx> >>>>> Date: Wed Aug 6 16:06:49 2014 -0700 >>>>> >>>>> hwpoison: fix race with changing page during offlining >>>>> >>>>> When a hwpoison page is locked it could change state due to parallel >>>>> modifications. The original compound page can be torn down and then >>>>> this 4k page becomes part of a differently-size compound page is is a >>>>> standalone regular page. >>>>> >>>>> Check after the lock if the page is still the same compound page. >>>> I can't speak to what the rules were ten years ago, but this is not >>>> true now. Compound pages cannot be split if you hold a refcount. >>>> Since we don't track a per-page refcount, we wouldn't know which of >>>> the split pages to give the excess refcount to. >>> I noticed this recently >>> >>> * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if >>> * they are not mapped. >>> * >>> * Returns 0 if the hugepage is split successfully. >>> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under >>> * us. >>> */ >>> int split_huge_page_to_list(struct page *page, struct list_head *list) >>> { >>> >>> I have a test case with poisoned shmem THP page that was mlocked and >>> >>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded. >> Can you elaborate your test case a little bit more detail? There is a check in split_huge_page_to_list(): >> >> /* Racy check whether the huge page can be split */ >> bool can_split_folio(struct folio *folio, int *pextra_pins) >> { >> int extra_pins; >> >> /* Additional pins from page cache */ >> if (folio_test_anon(folio)) >> extra_pins = folio_test_swapcache(folio) ? >> folio_nr_pages(folio) : 0; >> else >> extra_pins = folio_nr_pages(folio); >> if (pextra_pins) >> *pextra_pins = extra_pins; >> return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1; >> } >> >> So a large folio can only be split if only one extra page refcnt is held. It means large folio won't be split from >> under us if we hold an page refcnt. Or am I miss something? > My experiment was with an older kernel, though the can_split check is the same. > Also, I was emulating GUP pin with a hack: in madvise_inject_error(), replaced > get_user_pages_fast(start, 1, 0, &page) with > pin_user_pages_fast(start, 1, FOLL_WRITE|FOLL_LONGTERM, &page) IIUC, get_user_pages_fast() and pin_user_pages_fast(FOLL_LONGTERM) will both call try_grab_folio() to fetch extra page refcnt. get_user_pages_fast() will have FOLL_GET set while pin_user_pages_fast() will have FOLL_PIN set. It seems they works same for large folio about page refcnt. * * FOLL_GET: folio's refcount will be incremented by @refs. * * FOLL_PIN on large folios: folio's refcount will be incremented by * @refs, and its pincount will be incremented by @refs. * * FOLL_PIN on single-page folios: folio's refcount will be incremented by * @refs * GUP_PIN_COUNTING_BIAS. * * Return: The folio containing @page (with refcount appropriately * incremented) for success, or NULL upon failure. If neither FOLL_GET * nor FOLL_PIN was set, that's considered failure, and furthermore, * a likely bug in the caller, so a warning is also emitted. */ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) They will both call try_get_folio(page, refs) to fetch the page refcnt. So your hack with emulating GUP pin seems doesn't work as you expected. Or am I miss something? Thanks. > I suspect something might be wrong with my hack, I'm trying to reproduce with real GUP pin and on a newer kernel. > Will keep you informed. > thanks! > -jane > > >> >> Thanks. >> >>> thanks, >>> >>> -jane >>> >>> . > .