Re: hwpoison, shmem: fix data lost issue for 5.15.y

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 22, 2022 at 5:05 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 11/15/22 01:16, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Nov 14, 2022 at 02:53:51PM -0800, Mike Kravetz wrote:
> > > On 11/15/22 07:39, Naoya Horiguchi wrote:
> > > > On Mon, Nov 14, 2022 at 05:11:35PM +0100, Greg KH wrote:
> > > > > On Mon, Nov 14, 2022 at 10:14:03PM +0900, Naoya Horiguchi wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I'd like to request the follow commits to be backported to 5.15.y.
> > > > > >
> > > > > > - dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
> > > > > > - 4966455d9100 ("mm: hwpoison: handle non-anonymous THP correctly")
> > > > > > - a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
> > > > > >
> > > > > > These patches fixed a data lost issue by preventing shmem pagecache from
> > > > > > being removed by memory error.  These were not tagged for stable originally,
> > > > > > but that's revisited recently.
> > > > >
> > > > > And have you tested that these all apply properly (and in which order?)
> > > >
> > > > Yes, I've checked that these cleanly apply (without any change) on
> > > > 5.15.78 in the above order (i.e. dd0f23 is first, 496645 comes next,
> > > > then a76054).
> > > >
> > > > > and work correctly?
> > > >
> > > > Yes, I ran related testcases in my test suite, and their status changed
> > > > FAIL to PASS with these patches.
> > >
> > > Hi Naoya,
> > >
> > > Just curious if you have plans to do backports for earlier releases?
> >
> > I didn't have a clear plan.  I just thought that we should backport to
> > earlier kernels if someone want and the patches are applicable easily
> > enough and well-tested.
> >
> > >
> > > If not, I can start that effort.  We have seen data loss/corruption because of
> > > this on a 4.14 based release.   So, I would go at least that far back.
> >
> > Thank you for raising hand, that's really helpful.
> >
> > Maybe dd0f230a0a80 ("[PATCH] hugetlbfs: don't delete error page from
> > pagecbache") should be considered to backport together, because it's
> > the similar issue and reported (a while ago) to fail to backport.
> > dd0f230a0a80 does not apply cleanly on top of 5.15.78 + the above 3 patches.
> > So I need check more and will update my current proposal for 5.15.y.
>
> When working with 5.10.y, I noticed that commit eac96c3efdb5 ("mm: filemap:
> check if THP has hwpoisoned subpage for PMD page fault") as well as the
> prereq commit c7cb42e94473 ("mm: hwpoison: remove the unnecessary THP check")
> were not backported to 5.10.y.  Without those patches, THP testing will
> fail.
>
> Naoya and Yang Shi, does that sound right?

Yes, since the hwpoisoned THP will be kept in page cache so the page
fault may happen on it again, without that commit the page fault won't
return -EHWPOISON if I remember correctly.

>
> I have backports for those as well but want to check if you think
> anything else is needed.

Thanks for backporting them. No more fix is needed AFAICT.

> --
> Mike Kravetz




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux