On 2024/5/2 7:24, Jane Chu wrote: > When handle hwpoison in a GUP longterm pin'ed thp page, > try_to_split_thp_page() will fail. And at this point, there is little else > the kernel could do except sending a SIGBUS to the user process, thus > give it a chance to recover. > > Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx> Thanks for your patch. Some comments below. > --- > mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 7fcf182abb96..67f4d24a98e7 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > return rc; > } > > +/* > + * The calling condition is as such: thp split failed, page might have > + * been GUP longterm pinned, not much can be done for recovery. > + * But a SIGBUS should be delivered with vaddr provided so that the user > + * application has a chance to recover. Also, application processes' > + * election for MCE early killed will be honored. > + */ > +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, > + struct page *hpage) > +{ > + struct folio *folio = page_folio(hpage); > + LIST_HEAD(tokill); > + int res = -EHWPOISON; > + > + /* deal with user pages only */ > + if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p)) > + res = -EBUSY; > + if (!(PageLRU(hpage) || PageHuge(p))) > + res = -EBUSY; Above checks seems unneeded. We already know it's thp? > + > + if (res == -EHWPOISON) { > + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); > + kill_procs(&tokill, true, pfn, flags); > + } > + > + if (flags & MF_COUNT_INCREASED) > + put_page(p); This if block is broken. put_page() has been done when try_to_split_thp_page() fails? > + action_result is missing? > + return res; > +} > + > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags) > */ > SetPageHasHWPoisoned(hpage); > if (try_to_split_thp_page(p) < 0) { Should hwpoison_filter() be called in this case? > + if (flags & MF_ACTION_REQUIRED) { > + pr_err("%#lx: thp split failed\n", pfn); > + res = kill_procs_now(p, pfn, flags, hpage); Can we use hwpoison_user_mappings() directly here? Thanks. . > + goto unlock_mutex; > + } > res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); > goto unlock_mutex; > } >