Re: [PATCH] mm/memory-failure.c: bail out early if huge zero page

HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> · Wed, 13 Apr 2022 08:36:21 +0000

On Tue, Apr 12, 2022 at 07:08:45PM +0800, Miaohe Lin wrote:
...
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 9b76222ee237..771fb4fc626c 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1852,6 +1852,12 @@ int memory_failure(unsigned long pfn, int flags)
> >     }
> >
> >     if (PageTransHuge(hpage)) {
> > +           if (is_huge_zero_page(hpage)) {
> > +                   action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> > +                   res = -EBUSY;
> > +                   goto unlock_mutex;
> > +           }
> > +
>
> It seems that huge_zero_page could be handled simply by zap the corresponding page table without
> loss any user data.

Yes, zapping all page table entries to huge_zero_page is OK, and I think
that maybe huge_zero_page should be set to NULL.  The broken huge_zero page
has no user data, but could have corrupted data (with unexpected non-zero
bits), so it's safer to replace with new zero pages.  And
get_huge_zero_page() seems to allocate a new huge zero page if
huge_zero_page is NULL when called, so it would be gracefully switched
to new one on the first later access.

> Should we also try to handle this kind of page? Or just bail out as it's rare?

We should handle it if it's worth doing. I think that memory errors on zero
pages might be rare events (because they occupy small portion of physicall
memory). But if zero pages could be used by many process, the impact of the
error might be non-negligible.

Thanks,
Naoya Horiguchi