Re: [PATCH] mm/memory-failure.c: bail out early if huge zero page

Miaohe Lin <linmiaohe@xxxxxxxxxx> · Wed, 13 Apr 2022 17:03:18 +0800

On 2022/4/13 16:36, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Apr 12, 2022 at 07:08:45PM +0800, Miaohe Lin wrote:
> ...
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 9b76222ee237..771fb4fc626c 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1852,6 +1852,12 @@ int memory_failure(unsigned long pfn, int flags)
>>>     }
>>>
>>>     if (PageTransHuge(hpage)) {
>>> +           if (is_huge_zero_page(hpage)) {
>>> +                   action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
>>> +                   res = -EBUSY;
>>> +                   goto unlock_mutex;
>>> +           }
>>> +
>>
>> It seems that huge_zero_page could be handled simply by zap the corresponding page table without
>> loss any user data.
> 
> Yes, zapping all page table entries to huge_zero_page is OK, and I think
> that maybe huge_zero_page should be set to NULL.  The broken huge_zero page
> has no user data, but could have corrupted data (with unexpected non-zero
> bits), so it's safer to replace with new zero pages.  And
> get_huge_zero_page() seems to allocate a new huge zero page if
> huge_zero_page is NULL when called, so it would be gracefully switched
> to new one on the first later access.

Agree.

> 
>> Should we also try to handle this kind of page? Or just bail out as it's rare?
> 
> We should handle it if it's worth doing. I think that memory errors on zero
> pages might be rare events (because they occupy small portion of physicall
> memory). But if zero pages could be used by many process, the impact of the
> error might be non-negligible.

Yes, when this becomes non-negligible, we could handle it. :)
Thanks.

> 
> Thanks,
> Naoya Horiguchi
>