On Wed, Jul 28, 2021 at 2:36 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > On 7/28/21 12:51 AM, Dan Williams wrote: > > On Thu, Jul 15, 2021 at 5:01 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > >> On 7/15/21 12:36 AM, Dan Williams wrote: > >>> On Wed, Jul 14, 2021 at 12:36 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > >> This patch is not the culprit, the flaw is early in the series, specifically the fourth patch. > >> > >> It needs this chunk below change on the fourth patch due to the existing elevated page ref > >> count at zone device memmap init. put_page() called here in memunmap_pages(): > >> > >> for (i = 0; i < pgmap->nr_ranges; i++) > >> for_each_device_pfn(pfn, pgmap, i) > >> put_page(pfn_to_page(pfn)); > >> > >> ... on a zone_device compound memmap would otherwise always decrease head page refcount by > >> @geometry pfn amount (leading to the aforementioned splat you reported). > >> > >> diff --git a/mm/memremap.c b/mm/memremap.c > >> index b0e7b8cf3047..79a883af788e 100644 > >> --- a/mm/memremap.c > >> +++ b/mm/memremap.c > >> @@ -102,15 +102,15 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id) > >> return (range->start + range_len(range)) >> PAGE_SHIFT; > >> } > >> > >> -static unsigned long pfn_next(unsigned long pfn) > >> +static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned long pfn) > >> { > >> if (pfn % 1024 == 0) > >> cond_resched(); > >> - return pfn + 1; > >> + return pfn + pgmap_pfn_geometry(pgmap); > > > > The cond_resched() would need to be fixed up too to something like: > > > > if (pfn % (1024 << pgmap_geometry_order(pgmap))) > > cond_resched(); > > > > ...because the goal is to take a break every 1024 iterations, not > > every 1024 pfns. > > > > Ah, good point. > > >> } > >> > >> #define for_each_device_pfn(pfn, map, i) \ > >> - for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn)) > >> + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn)) > >> > >> static void dev_pagemap_kill(struct dev_pagemap *pgmap) > >> { > >> > >> It could also get this hunk below, but it is sort of redundant provided we won't touch > >> tail page refcount through out the devmap pages lifetime. This setting of tail pages > >> refcount to zero was in pre-v5.14 series, but it got removed under the assumption it comes > >> from the page allocator (where tail pages are already zeroed in refcount). > > > > Wait, devmap pages never see the page allocator? > > > "where tail pages are already zeroed in refcount" this actually meant 'freshly allocated > pages' and I was referring to commit 7118fc2906e2 ("hugetlb: address ref count racing in > prep_compound_gigantic_page") that removed set_page_count() because the setting of page > ref count to zero was redundant. Ah, maybe include that reference in the changelog? > > Albeit devmap pages don't come from page allocator, you know separate zone and these pages > aren't part of the regular page pools (e.g. accessible via alloc_pages()), as you are > aware. Unless of course, we reassign them via dax_kmem, but then the way we map the struct > pages would be regular without any devmap stuff. Got it. I think with the back reference to that commit (7118fc2906e2) it resolves my confusion. > > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 96975edac0a8..469a7aa5cf38 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -6623,6 +6623,7 @@ static void __ref memmap_init_compound(struct page *page, unsigned > >> long pfn, > >> __init_zone_device_page(page + i, pfn + i, zone_idx, > >> nid, pgmap); > >> prep_compound_tail(page, i); > >> + set_page_count(page + i, 0); > > > > Looks good to me and perhaps a for elevated tail page refcount at > > teardown as a sanity check that the tail pages was never pinned > > directly? > > > Sorry didn't follow completely. > > You meant to set tail page refcount back to 1 at teardown if it was kept to 0 (e.g. > memunmap_pages() after put_page()) or that the refcount is indeed kept to zero after the > put_page() in memunmap_pages() ? The latter, i.e. would it be worth it to check that a tail page did not get accidentally pinned instead of a head page? I'm also ok to leave out that sanity checking for now.