When we do coredump for user process signal, this may be an SIGBUS signal with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is resulted from ECC memory fail like SRAR or SRAO, we expect the memory recovery work is finished correctly, then the get_dump_page() will not return the error page as its process pte is set invalid by memory_failure(). But memory_failure() may fail, and the process's related pte may not be correctly set invalid, for current code, we will return the poison page and get it dumped and lead to system panic as its in kernel code. So check the poison status in get_dump_page(), and if TRUE, return NULL. Signed-off-by: Aili Yao <yaoaili@xxxxxxxxxxxx> --- mm/gup.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index e4c224c..499a496 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1536,6 +1536,14 @@ struct page *get_dump_page(unsigned long addr) FOLL_FORCE | FOLL_DUMP | FOLL_GET); if (locked) mmap_read_unlock(mm); + + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) && ret == 1) { + if (unlikely(PageHuge(page) && PageHWPoison(compound_head(page)))) + ret = 0; + else if (unlikely(PageHWPoison(page))) + ret = 0; + } + return (ret == 1) ? page : NULL; } #endif /* CONFIG_ELF_CORE */ -- 1.8.3.1