On Tue, Apr 12, 2022 at 07:08:45PM +0800, Miaohe Lin wrote: ... > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index 9b76222ee237..771fb4fc626c 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -1852,6 +1852,12 @@ int memory_failure(unsigned long pfn, int flags) > > } > > > > if (PageTransHuge(hpage)) { > > + if (is_huge_zero_page(hpage)) { > > + action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); > > + res = -EBUSY; > > + goto unlock_mutex; > > + } > > + > > It seems that huge_zero_page could be handled simply by zap the corresponding page table without > loss any user data. Yes, zapping all page table entries to huge_zero_page is OK, and I think that maybe huge_zero_page should be set to NULL. The broken huge_zero page has no user data, but could have corrupted data (with unexpected non-zero bits), so it's safer to replace with new zero pages. And get_huge_zero_page() seems to allocate a new huge zero page if huge_zero_page is NULL when called, so it would be gracefully switched to new one on the first later access. > Should we also try to handle this kind of page? Or just bail out as it's rare? We should handle it if it's worth doing. I think that memory errors on zero pages might be rare events (because they occupy small portion of physicall memory). But if zero pages could be used by many process, the impact of the error might be non-negligible. Thanks, Naoya Horiguchi