Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Mon, 23 Oct 2017 11:20:02 -0700

On 10/23/2017 12:32 AM, Naoya Horiguchi wrote:
> On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
>> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
>>> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
>>>
>>> Thank you for addressing this. The patch itself looks good to me, but
>>> the reported issue (negative reserve count) doesn't reproduce in my trial
>>> with v4.14-rc5, so could you share the exact procedure for this issue?
>>
>> Sure, but first one question on your test scenario below.
>>
>>>
>>> When error handler runs over a huge page, the reserve count is incremented
>>> so I'm not sure why the reserve count goes negative.
>>
>> I'm not sure I follow.  What specific code is incrementing the reserve
>> count?
> 
> The call path is like below:
> 
>   hugetlbfs_error_remove_page
>     hugetlb_fix_reserve_counts
>       hugepage_subpool_get_pages(spool, 1)
>         hugetlb_acct_memory(h, 1);
>           gather_surplus_pages
>             h->resv_huge_pages += delta;
> 

Ah OK.  This is a result of call to hugetlb_fix_reserve_counts which
I believe is incorrect in most instances, and is unlikely to happen 
with my patch.

>>
>> Remove the file (rm /var/opt/oracle/hugepool/foo)
>> -------------------------------------------------
>> HugePages_Total:       1
>> HugePages_Free:        0
>> HugePages_Rsvd:    18446744073709551615
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>>
>> I am still confused about how your test maintains a reserve count after
>> poisoning.  It may be a good idea for you to test my patch with your
>> test scenario as I can not recreate here.
> 
> Interestingly, I found that this reproduces if all hugetlb pages are
> reserved when poisoning.
> Your testing meets the condition, and mine doesn't.
> 
> In gather_surplus_pages() we determine whether we extend hugetlb pool
> with surplus pages like below:
> 
>     needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
>     if (needed <= 0) {
>             h->resv_huge_pages += delta;
>             return 0;
>     }
>     ...
> 
> needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
> the reserve count gets inconsistent.
> I confirmed that your patch fixes the issue, so I'm OK with it.

Thanks.  That now makes sense to me.

hugetlb_fix_reserve_counts (which results in gather_surplus_pages being
called), is only designed to be called in the extremely rare cases when
we have free'ed a huge page but are unable to free the reservation entry.

Just curious, when the hugetlb_fix_reserve_counts call was added to
hugetlbfs_error_remove_page, was the intention to preserve the original
reservation?  I remember thinking hard about that for the hole punch
case and came to the conclusion that it was easier and less error prone
to remove the reservation as well.  That will also happen in the error
case with the patch I provided.

-- 
Mike Kravetz

> 
> Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> 
> Thanks,
> Naoya Horiguchi
>