Re: hwpoison, shmem: fix data lost issue for 5.15.y

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 15, 2022 at 07:48:42PM -0800, Mike Kravetz wrote:
> On 11/15/22 15:39, Naoya Horiguchi wrote:
> > On Mon, Nov 14, 2022 at 05:30:29PM -0800, Mike Kravetz wrote:
> > > On 11/15/22 01:16, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > On Mon, Nov 14, 2022 at 02:53:51PM -0800, Mike Kravetz wrote:
> > > > > On 11/15/22 07:39, Naoya Horiguchi wrote:
> > > > > > On Mon, Nov 14, 2022 at 05:11:35PM +0100, Greg KH wrote:
> > > > > > > On Mon, Nov 14, 2022 at 10:14:03PM +0900, Naoya Horiguchi wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > I'd like to request the follow commits to be backported to 5.15.y.
> > > > > > > > 
> > > > > > > > - dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
> > > > > > > > - 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly")
> > > > > > > > - a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
> > > > > > > > 
> > > > > > > > These patches fixed a data lost issue by preventing shmem pagecache from
> > > > > > > > being removed by memory error.  These were not tagged for stable originally,
> > > > > > > > but that's revisited recently.
> > > > > > > 
> > > > > > > And have you tested that these all apply properly (and in which order?)
> > > > > > 
> > > > > > Yes, I've checked that these cleanly apply (without any change) on
> > > > > > 5.15.78 in the above order (i.e. dd0f23 is first, 496645 comes next,
> > > > > > then a76054).
> > > > > > 
> > > > > > > and work correctly?
> > > > > > 
> > > > > > Yes, I ran related testcases in my test suite, and their status changed
> > > > > > FAIL to PASS with these patches.
> > > > > 
> > > > > Hi Naoya,
> > > > > 
> > > > > Just curious if you have plans to do backports for earlier releases?
> > > > 
> > > > I didn't have a clear plan.  I just thought that we should backport to
> > > > earlier kernels if someone want and the patches are applicable easily
> > > > enough and well-tested.
> > > > 
> > > > > 
> > > > > If not, I can start that effort.  We have seen data loss/corruption because of
> > > > > this on a 4.14 based release.   So, I would go at least that far back.
> > > > 
> > > > Thank you for raising hand, that's really helpful.
> > > > 
> > > > Maybe dd0f230a0a80 ("[PATCH] hugetlbfs: don't delete error page from
> > 
> > # I meant 8625147cafaa, sorry if the wrong commit ID confused you.
> > 
> > I tested with 8625147cafaa too, and it made hugetlb-related testcases
> > passed.
> <snip>
> > We need to slightly modify 8625147cafaa to apply to 5.15.y.  So in summary,
> > my updated suggestion for 5.15.y is like below:
> > 
> > - [1/4] cherry-pick dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
> > - [2/4] cherry-pick 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly")
> > - [3/4] cherry-pick a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
> > - [4/4] apply the following patch (as a modified version of 8625147cafaa)
> 
> Hi Naoya,
> 
> Just curious, do you have automated tests for this?  I wanted test backports
> to each stable release.  I could manually test, but that would be a bit
> involved and was hoping you had something automated.

Yes, related testcases are available on https://github.com/nhoriguchi/mm_regression.
You can run them by the following steps:

  $ git clone https://github.com/nhoriguchi/mm_regression.
  $ cd mm_regression

  # Check that your testing server meets the prerequisite
  # https://github.com/nhoriguchi/mm_regression#prerequisite    

  $ make
  ...
  # Compiler might show errors but that's OK because all
  # files are not needed to run relevant testcases.

  $ bash run.sh prepare debug

  # List the testcases in work/debug/recipelist like below:

  $ cat work/debug/recipelist
  mm/hwpoison/shmem_link/link-hard.auto3
  mm/hwpoison/shmem_link/link-sym.auto3
  mm/hwpoison/shmem_rw/thp-always.auto3
  mm/hwpoison/shmem_rw/thp-never.auto3

  $ bash run.sh project run

Thanks,
Naoya Horiguchi



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux