Re: [PATCH v3 12/14] device-dax: compound pagemap support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/28/21 12:51 AM, Dan Williams wrote:
> On Thu, Jul 15, 2021 at 5:01 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
>> On 7/15/21 12:36 AM, Dan Williams wrote:
>>> On Wed, Jul 14, 2021 at 12:36 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
>> This patch is not the culprit, the flaw is early in the series, specifically the fourth patch.
>>
>> It needs this chunk below change on the fourth patch due to the existing elevated page ref
>> count at zone device memmap init. put_page() called here in memunmap_pages():
>>
>> for (i = 0; i < pgmap->nr_ranges; i++)
>>         for_each_device_pfn(pfn, pgmap, i)
>>                 put_page(pfn_to_page(pfn));
>>
>> ... on a zone_device compound memmap would otherwise always decrease head page refcount by
>> @geometry pfn amount (leading to the aforementioned splat you reported).
>>
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index b0e7b8cf3047..79a883af788e 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -102,15 +102,15 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id)
>>         return (range->start + range_len(range)) >> PAGE_SHIFT;
>>  }
>>
>> -static unsigned long pfn_next(unsigned long pfn)
>> +static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned long pfn)
>>  {
>>         if (pfn % 1024 == 0)
>>                 cond_resched();
>> -       return pfn + 1;
>> +       return pfn + pgmap_pfn_geometry(pgmap);
> 
> The cond_resched() would need to be fixed up too to something like:
> 
> if (pfn % (1024 << pgmap_geometry_order(pgmap)))
>     cond_resched();
> 
> ...because the goal is to take a break every 1024 iterations, not
> every 1024 pfns.
> 

Ah, good point.

>>  }
>>
>>  #define for_each_device_pfn(pfn, map, i) \
>> -       for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn))
>> +       for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn))
>>
>>  static void dev_pagemap_kill(struct dev_pagemap *pgmap)
>>  {
>>
>> It could also get this hunk below, but it is sort of redundant provided we won't touch
>> tail page refcount through out the devmap pages lifetime. This setting of tail pages
>> refcount to zero was in pre-v5.14 series, but it got removed under the assumption it comes
>> from the page allocator (where tail pages are already zeroed in refcount).
> 
> Wait, devmap pages never see the page allocator?
> 
"where tail pages are already zeroed in refcount" this actually meant 'freshly allocated
pages' and I was referring to commit 7118fc2906e2 ("hugetlb: address ref count racing in
prep_compound_gigantic_page") that removed set_page_count() because the setting of page
ref count to zero was redundant.

Albeit devmap pages don't come from page allocator, you know separate zone and these pages
aren't part of the regular page pools (e.g. accessible via alloc_pages()), as you are
aware. Unless of course, we reassign them via dax_kmem, but then the way we map the struct
pages would be regular without any devmap stuff.

>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 96975edac0a8..469a7aa5cf38 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6623,6 +6623,7 @@ static void __ref memmap_init_compound(struct page *page, unsigned
>> long pfn,
>>                 __init_zone_device_page(page + i, pfn + i, zone_idx,
>>                                         nid, pgmap);
>>                 prep_compound_tail(page, i);
>> +               set_page_count(page + i, 0);
> 
> Looks good to me and perhaps a for elevated tail page refcount at
> teardown as a sanity check that the tail pages was never pinned
> directly?
> 
Sorry didn't follow completely.

You meant to set tail page refcount back to 1 at teardown if it was kept to 0 (e.g.
memunmap_pages() after put_page()) or that the refcount is indeed kept to zero after the
put_page() in memunmap_pages() ?



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux