From: Miaohe Lin <linmiaohe@xxxxxxxxxx> [ Upstream commit 888af2701db79b9b27c7e37f9ede528a5ca53b76 ] Patch series "A few fixup patches for memory failure", v2. This series contains a few patches to fix the race with changing page compound page, make non-LRU movable pages unhandlable and so on. More details can be found in the respective changelogs. There is a race window where we got the compound_head, the hugetlb page could be freed to buddy, or even changed to another compound page just before we try to get hwpoison page. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is introduced to record this event. [akpm@xxxxxxxxxxxxxxxxxxxx: s@/**@/*@, per Naoya Horiguchi] Link: https://lkml.kernel.org/r/20220312074613.4798-1-linmiaohe@xxxxxxxxxx Link: https://lkml.kernel.org/r/20220312074613.4798-2-linmiaohe@xxxxxxxxxx Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> Acked-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Cc: Tony Luck <tony.luck@xxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: Yang Shi <shy828301@xxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> --- include/linux/mm.h | 1 + include/ras/ras_event.h | 1 + mm/memory-failure.c | 12 ++++++++++++ 3 files changed, 14 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 85205adcdd0d..7a80a08eec84 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3167,6 +3167,7 @@ enum mf_action_page_type { MF_MSG_BUDDY_2ND, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, + MF_MSG_DIFFERENT_PAGE_SIZE, MF_MSG_UNKNOWN, }; diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index 0bdbc0d17d2f..cac13ff1d6eb 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -376,6 +376,7 @@ TRACE_EVENT(aer_event, EM ( MF_MSG_BUDDY_2ND, "free buddy page (2nd try)" ) \ EM ( MF_MSG_DAX, "dax page" ) \ EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ EMe ( MF_MSG_UNKNOWN, "unknown page" ) /* diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5664bafd5e77..a4d70c21c146 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -741,6 +741,7 @@ static const char * const action_page_types[] = { [MF_MSG_BUDDY_2ND] = "free buddy page (2nd try)", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", [MF_MSG_UNKNOWN] = "unknown page", }; @@ -1461,6 +1462,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) } lock_page(head); + + /* + * The page could have changed compound pages due to race window. + * If this happens just bail out. + */ + if (!PageHuge(p) || compound_head(p) != head) { + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); + res = -EBUSY; + goto out; + } + page_flags = head->flags; /* -- 2.35.1