On 2022/4/13 16:36, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Apr 12, 2022 at 07:08:45PM +0800, Miaohe Lin wrote: > ... >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>> index 9b76222ee237..771fb4fc626c 100644 >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -1852,6 +1852,12 @@ int memory_failure(unsigned long pfn, int flags) >>> } >>> >>> if (PageTransHuge(hpage)) { >>> + if (is_huge_zero_page(hpage)) { >>> + action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); >>> + res = -EBUSY; >>> + goto unlock_mutex; >>> + } >>> + >> >> It seems that huge_zero_page could be handled simply by zap the corresponding page table without >> loss any user data. > > Yes, zapping all page table entries to huge_zero_page is OK, and I think > that maybe huge_zero_page should be set to NULL. The broken huge_zero page > has no user data, but could have corrupted data (with unexpected non-zero > bits), so it's safer to replace with new zero pages. And > get_huge_zero_page() seems to allocate a new huge zero page if > huge_zero_page is NULL when called, so it would be gracefully switched > to new one on the first later access. Agree. > >> Should we also try to handle this kind of page? Or just bail out as it's rare? > > We should handle it if it's worth doing. I think that memory errors on zero > pages might be rare events (because they occupy small portion of physicall > memory). But if zero pages could be used by many process, the impact of the > error might be non-negligible. Yes, when this becomes non-negligible, we could handle it. :) Thanks. > > Thanks, > Naoya Horiguchi >