On 2024/5/9 1:45, Jane Chu wrote: > On 5/8/2024 1:08 AM, Miaohe Lin wrote: > >> On 2024/5/7 4:26, Jane Chu wrote: >>> On 5/5/2024 12:00 AM, Miaohe Lin wrote: >>> >>>> On 2024/5/2 7:24, Jane Chu wrote: >>>>> When handle hwpoison in a GUP longterm pin'ed thp page, >>>>> try_to_split_thp_page() will fail. And at this point, there is little else >>>>> the kernel could do except sending a SIGBUS to the user process, thus >>>>> give it a chance to recover. >>>>> >>>>> Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx> >>>> Thanks for your patch. Some comments below. >>>> >>>>> --- >>>>> mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 36 insertions(+) >>>>> >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>> index 7fcf182abb96..67f4d24a98e7 100644 >>>>> --- a/mm/memory-failure.c >>>>> +++ b/mm/memory-failure.c >>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, >>>>> return rc; >>>>> } >>>>> +/* >>>>> + * The calling condition is as such: thp split failed, page might have >>>>> + * been GUP longterm pinned, not much can be done for recovery. >>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user >>>>> + * application has a chance to recover. Also, application processes' >>>>> + * election for MCE early killed will be honored. >>>>> + */ >>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, >>>>> + struct page *hpage) >>>>> +{ >>>>> + struct folio *folio = page_folio(hpage); >>>>> + LIST_HEAD(tokill); >>>>> + int res = -EHWPOISON; >>>>> + >>>>> + /* deal with user pages only */ >>>>> + if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p)) >>>>> + res = -EBUSY; >>>>> + if (!(PageLRU(hpage) || PageHuge(p))) >>>>> + res = -EBUSY; >>>> Above checks seems unneeded. We already know it's thp? >>> Agreed. >>> >>> I lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic, >>> >>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page. >>> >>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios. >>> >>>>> + >>>>> + if (res == -EHWPOISON) { >>>>> + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); >>>>> + kill_procs(&tokill, true, pfn, flags); >>>>> + } >>>>> + >>>>> + if (flags & MF_COUNT_INCREASED) >>>>> + put_page(p); >>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails? >>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should. >> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below: >> >> static int try_to_split_thp_page(struct page *page) >> { >> int ret; >> >> lock_page(page); >> ret = split_huge_page(page); >> unlock_page(page); >> >> if (unlikely(ret)) >> put_page(page); >> ^^^^^^^^^^^^^^^^^^^^^^^ >> return ret; >> } >> >> Or am I miss something? > > I think you caught a bug in my code, thanks! > > How about moving put_page() outside try_to_split_thp_page() ? If you want to send SIGBUS in the event of thp split fail, it might be required to do so. I think kill_procs_now() needs extra thp refcnt to do its work. > >> >>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not. >>> >>>>> + >>>> action_result is missing? >>> Indeed, action_result() isn't always called, referring to the re-accessing hwpoison scenarios. >>> >>> In this case, I think the reason is that, we just killed the process and there is nothing >>> >>> else to do or to report. >>> >>>>> + return res; >>>>> +} >>>>> + >>>>> /** >>>>> * memory_failure - Handle memory failure of a page. >>>>> * @pfn: Page Number of the corrupted page >>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags) >>>>> */ >>>>> SetPageHasHWPoisoned(hpage); >>>>> if (try_to_split_thp_page(p) < 0) { >>>> Should hwpoison_filter() be called in this case? >>> Yes, it should. I will add the hwpoison_filter check. >>>>> + if (flags & MF_ACTION_REQUIRED) { >> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it? > > I took a clue from kill_accessing_process() which is invoked only if MF_ACTION_REQUIRED is set. > > The usual code path for delivery signal is > > if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then > > - send SIGKILL if vaddr is -EFAULT > > - send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set > > - send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and process elected for MCE-early-kill > > So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is in the patch), one can argue that > > the MCE-early-kill request is not honored which deviates from the existing behavior. > > Perhaps I should remove the > > + if (flags & MF_ACTION_REQUIRED) { I tend to agree MCE-early-kill request should be honored when try to kill process. Thanks. .