Changes in v4: - collected R-B from Oscar Salvador - collected Acked-by from Miaohe Lin - fixed comment on MF_DELAYED, and comments for better coding. - Miaohe Lin Changes in v3: - rebased to mainline as of 5/20/2024 - added an acked-by from Miaohe Lin - picked up a R-B from Oscar Salvador - fixed/clarified comments about MF_IGNORED/MF_FAILED definition and usage. - Oscar Salvador - invoke hwpoison_filter slightly earlier to avoid unnecessary THP split, and with refcount held. - Miaohe Lin - added comments to try_to_split_thp_page() on when not to release page refcount. - Oscar Salvador - added action_result() in a couple cases, but take care not to overwrite the intended returns. - Oscar Salvador Changes in v2: - rebased to mm-stable as of 5/8/2024 - added RB by Oscar Salvador - comments from Oscar on patch 1-of-3: clarify changelog - comments from Miahe Lin on patch 3-of-3: remove unnecessary user page checking and remove incorrect put_page() in kill_procs_now(). Invoke kill_procs_now() regardless MF_ACTIN_REQUIRED is set or not, moved hwpoison_filter() higher up. - added two patches 3-of-5 and 4-of-5 This series aim at the following enhancement - - Let one hwpoison injector, that is, madvise(MADV_HWPOISON) to behave more like as if a real UE occurred. Because the other two injectors such as hwpoison-inject and the 'einj' on x86 can't, and it seems to me we need a better simulation to real UE scenario. - For years, if the kernel is unable to unmap a hwpoisoned page, it send a SIGKILL instead of SIGBUS to prevent user process from potentially accessing the page again. But in doing so, the user process also lose important information: vaddr, for recovery. Fortunately, the kernel already has code to kill process re-accessing a hwpoisoned page, so remove the '!unmap_success' check. - Right now, if a thp page under GUP longterm pin is hwpoisoned, and kernel cannot split the thp page, memory-failure simply ignores the UE and returns. That's not ideal, it could deliver a SIGBUS with useful information for userspace recovery. Jane Chu (5): mm/memory-failure: try to send SIGBUS even if unmap failed mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) mm/memory-failure: improve memory failure action_result messages mm/memory-failure: move hwpoison_filter() higher up mm/memory-failure: send SIGBUS in the event of thp split fail include/linux/mm.h | 2 + include/ras/ras_event.h | 2 + mm/madvise.c | 2 +- mm/memory-failure.c | 106 +++++++++++++++++++++++++++++----------- 4 files changed, 82 insertions(+), 30 deletions(-) -- 2.39.3