On 2022/4/12 16:31, Oscar Salvador wrote: > On Sun, Apr 10, 2022 at 11:22:34PM +0800, Xu Yu wrote: >> Kernel panic when injecting memory_failure for the global huge_zero_page, >> when CONFIG_DEBUG_VM is enabled, as follows. > ... >> In fact, huge_zero_page is unhandlable currently in either soft offline >> or memory failure injection. With CONFIG_DEBUG_VM disabled, >> huge_zero_page is bailed out when checking HWPoisonHandlable() in >> get_any_page(), or checking page mapping in split_huge_page_to_list(). >> >> This makes huge_zero_page bail out early in madvise_inject_error(), and >> panic above won't happen again. > > I would not special case this in madvise_inject_error() but rather > handle it in memory-failure code. > We do already have HWPoisonHandlable(), which tells us whether the page > is of a type we can really do something about, so why not add another > check in HWPoisonHandlable() for huge_zero_page(), and have that checked > in memory_failure(). IIUC, this does not work. Because HWPoisonHandlable is only called in !MF_COUNT_INCREASED case. But MF_COUNT_INCREASED is always set when called from madvise_inject_error, so HWPoisonHandlable is not even called in this scene. Or am I miss something? BTW: IIRC, LRU isn't set on huge_zero_page. So the origin HWPoisonHandlable can already filter out this page. Thanks! > Something like (untested): > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index dcb6bb9cf731..dccd0503f803 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1181,6 +1181,10 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags) > { > bool movable = false; > > + /* Can't handle huge_zero_page() */ > + if(is_huge_zero_page(compound_head(page))) > + return false; > + > /* Soft offline could mirgate non-LRU movable pages */ > if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page)) > movable = true; > @@ -1796,6 +1800,10 @@ int memory_failure(unsigned long pfn, int flags) > res = -EBUSY; > goto unlock_mutex; > } > + } else if(!HWPoisonHandable(p)) { > + action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED); > + res = -EBUSY; > + goto unlock_mutex; > } > > if (PageTransHuge(hpage)) { > > It can certainly be prettier, but you can get the idea. > > >> >> Reported-by: Abaci <abaci@xxxxxxxxxxxxxxxxx> >> Signed-off-by: Xu Yu <xuyu@xxxxxxxxxxxxxxxxx> >> --- >> mm/madvise.c | 13 +++++++++++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/mm/madvise.c b/mm/madvise.c >> index 1873616a37d2..03ad50d222e0 100644 >> --- a/mm/madvise.c >> +++ b/mm/madvise.c >> @@ -1079,7 +1079,7 @@ static int madvise_inject_error(int behavior, >> >> for (; start < end; start += size) { >> unsigned long pfn; >> - struct page *page; >> + struct page *page, *head; >> int ret; >> >> ret = get_user_pages_fast(start, 1, 0, &page); >> @@ -1087,12 +1087,21 @@ static int madvise_inject_error(int behavior, >> return ret; >> pfn = page_to_pfn(page); >> >> + head = compound_head(page); >> + if (unlikely(is_huge_zero_page(head))) { >> + pr_warn("Unhandlable attempt to %s pfn %#lx at process virtual address %#lx\n", >> + behavior == MADV_SOFT_OFFLINE ? "soft offline" : >> + "inject memory failure for", >> + pfn, start); >> + return -EINVAL; >> + } >> + >> /* >> * When soft offlining hugepages, after migrating the page >> * we dissolve it, therefore in the second loop "page" will >> * no longer be a compound page. >> */ >> - size = page_size(compound_head(page)); >> + size = page_size(head); >> >> if (behavior == MADV_SOFT_OFFLINE) { >> pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n", >> -- >> 2.20.1.2432.ga663e714 >> >> >