On Mon, Nov 14, 2022 at 05:30:29PM -0800, Mike Kravetz wrote: > On 11/15/22 01:16, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Mon, Nov 14, 2022 at 02:53:51PM -0800, Mike Kravetz wrote: > > > On 11/15/22 07:39, Naoya Horiguchi wrote: > > > > On Mon, Nov 14, 2022 at 05:11:35PM +0100, Greg KH wrote: > > > > > On Mon, Nov 14, 2022 at 10:14:03PM +0900, Naoya Horiguchi wrote: > > > > > > Hi, > > > > > > > > > > > > I'd like to request the follow commits to be backported to 5.15.y. > > > > > > > > > > > > - dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling") > > > > > > - 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly") > > > > > > - a76054266661 ("mm: shmem: don't truncate page if memory failure happens") > > > > > > > > > > > > These patches fixed a data lost issue by preventing shmem pagecache from > > > > > > being removed by memory error. These were not tagged for stable originally, > > > > > > but that's revisited recently. > > > > > > > > > > And have you tested that these all apply properly (and in which order?) > > > > > > > > Yes, I've checked that these cleanly apply (without any change) on > > > > 5.15.78 in the above order (i.e. dd0f23 is first, 496645 comes next, > > > > then a76054). > > > > > > > > > and work correctly? > > > > > > > > Yes, I ran related testcases in my test suite, and their status changed > > > > FAIL to PASS with these patches. > > > > > > Hi Naoya, > > > > > > Just curious if you have plans to do backports for earlier releases? > > > > I didn't have a clear plan. I just thought that we should backport to > > earlier kernels if someone want and the patches are applicable easily > > enough and well-tested. > > > > > > > > If not, I can start that effort. We have seen data loss/corruption because of > > > this on a 4.14 based release. So, I would go at least that far back. > > > > Thank you for raising hand, that's really helpful. > > > > Maybe dd0f230a0a80 ("[PATCH] hugetlbfs: don't delete error page from # I meant 8625147cafaa, sorry if the wrong commit ID confused you. I tested with 8625147cafaa too, and it made hugetlb-related testcases passed. > > pagecbache") should be considered to backport together, because it's > > the similar issue and reported (a while ago) to fail to backport. > > Since dd0f230a0a80 was marked for backports, Greg's automation flags it as > FAILED due to conflicts in earlier releases. I am not sure if James has > a plan to do backports for dd0f230a0a80. Again, this is also something I > would help with due to real customer issues. OK, so I do want 8625147cafaa to be merged to stable. Another reason why I addressed 8625147cafaa here is that this patch depends on the code handling "extra_pins" (given by dd0f230a0a80) and it's simpler to handle together. We need to slightly modify 8625147cafaa to apply to 5.15.y. So in summary, my updated suggestion for 5.15.y is like below: - [1/4] cherry-pick dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling") - [2/4] cherry-pick 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly") - [3/4] cherry-pick a76054266661 ("mm: shmem: don't truncate page if memory failure happens") - [4/4] apply the following patch (as a modified version of 8625147cafaa) Thanks, Naoya Horiguchi --- >From bdd88699d8ba6907e0815608482f8c0e2170982d Mon Sep 17 00:00:00 2001 From: James Houghton <jthoughton@xxxxxxxxxx> Date: Tue, 15 Nov 2022 15:20:51 +0900 Subject: [PATCH] hugetlbfs: don't delete error page from pagecache commit 8625147cafaa9ba74713d682f5185eb62cb2aedb upstream. This change is very similar to the change that was made for shmem [1], and it solves the same problem but for HugeTLBFS instead. Currently, when poison is found in a HugeTLB page, the page is removed from the page cache. That means that attempting to map or read that hugepage in the future will result in a new hugepage being allocated instead of notifying the user that the page was poisoned. As [1] states, this is effectively memory corruption. The fix is to leave the page in the page cache. If the user attempts to use a poisoned HugeTLB page with a syscall, the syscall will fail with EIO, the same error code that shmem uses. For attempts to map the page, the thread will get a BUS_MCEERR_AR SIGBUS. [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens") Although there's a conflict in hugetlbfs_error_remove_page() when backporting to stable tree, the resolution is almost trivial (just removing conflicting lines in the function). Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@xxxxxxxxxx Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx> Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Tested-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> Cc: James Houghton <jthoughton@xxxxxxxxxx> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx> Cc: Muchun Song <songmuchun@xxxxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/hugetlbfs/inode.c | 13 ++++++------- mm/hugetlb.c | 4 ++++ mm/memory-failure.c | 5 ++++- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index d74a49b188c2..be8deec29ebe 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -361,6 +361,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) } else { unlock_page(page); + if (PageHWPoison(page)) { + put_page(page); + retval = -EIO; + break; + } + /* * We have the page, copy it to user space buffer. */ @@ -984,13 +990,6 @@ static int hugetlbfs_migrate_page(struct address_space *mapping, static int hugetlbfs_error_remove_page(struct address_space *mapping, struct page *page) { - struct inode *inode = mapping->host; - pgoff_t index = page->index; - - remove_huge_page(page); - if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) - hugetlb_fix_reserve_counts(inode); - return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dbb63ec3b5fa..e7bd42f23667 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5350,6 +5350,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, ptl = huge_pte_lockptr(h, dst_mm, dst_pte); spin_lock(ptl); + ret = -EIO; + if (PageHWPoison(page)) + goto out_release_unlock; + /* * Recheck the i_size after holding PT lock to make sure not * to leave any page mapped (as page_mapped()) beyond the end diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 85b1a77e3a99..2ad0f4580091 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1040,6 +1040,7 @@ static int me_huge_page(struct page_state *ps, struct page *p) int res; struct page *hpage = compound_head(p); struct address_space *mapping; + bool extra_pins = false; if (!PageHuge(hpage)) return MF_DELAYED; @@ -1047,6 +1048,8 @@ static int me_huge_page(struct page_state *ps, struct page *p) mapping = page_mapping(hpage); if (mapping) { res = truncate_error_page(hpage, page_to_pfn(p), mapping); + /* The page is kept in page cache. */ + extra_pins = true; unlock_page(hpage); } else { res = MF_FAILED; @@ -1064,7 +1067,7 @@ static int me_huge_page(struct page_state *ps, struct page *p) } } - if (has_extra_refcount(ps, p, false)) + if (has_extra_refcount(ps, p, extra_pins)) res = MF_FAILED; return res; -- 2.37.3.518.g79f2338b37