Re: [PATCH 0/6] hwpoison, shmem, hugetlb: fix data loss issue 5.10.y

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2022/11/24 AM3:54, Mike Kravetz wrote:
> This is a request for adding the following patches to stable 5.10.y.
> 
> Poisoned shmem and hugetlb pages are removed from the pagecache.
> Subsequent access to the offset in the file results in a NEW zero
> filled page.  Application code does not get notified of the data
> loss, and the only 'clue' is a message in the system log.  Data
> loss has been experienced by real users.
> 
> This was addressed upstream.  Most commits were marked for backports,
> but some were not.  This was discussed here [1] and here [2].
> 
> Patches apply cleanly to v5.4.224 and pass tests checking for this
> specific data loss issue.  LTP mm tests show no regressions.
> 
> All patches except 4 "mm: hwpoison: handle non-anonymous THP correctly"
> required a small bit of change to apply correctly: mostly for context.
> 
> linux-mm Cc'ed as it would be great to get at least an ACK from others
> familiar with this issue.
> 
> [1] https://lore.kernel.org/linux-mm/Y2UTUNBHVY5U9si2@monkey/
> [2] https://lore.kernel.org/stable/20221114131403.GA3807058@u2004/
> 
> James Houghton (1):
>   hugetlbfs: don't delete error page from pagecache
> 
> Yang Shi (5):
>   mm: hwpoison: remove the unnecessary THP check
>   mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
>   mm: hwpoison: refactor refcount check handling
>   mm: hwpoison: handle non-anonymous THP correctly
>   mm: shmem: don't truncate page if memory failure happens
> 
>  fs/hugetlbfs/inode.c       |  13 ++--
>  include/linux/page-flags.h |  23 ++++++
>  mm/huge_memory.c           |   2 +
>  mm/hugetlb.c               |   4 +
>  mm/memory-failure.c        | 153 ++++++++++++++++++++++++-------------
>  mm/memory.c                |   9 +++
>  mm/page_alloc.c            |   4 +-
>  mm/shmem.c                 |  51 +++++++++++--
>  8 files changed, 191 insertions(+), 68 deletions(-)
> 

Hi, folks

Thank you for your effort. Data loss will break the data consistency of
end users and it is critical to notify users.

I tried to apply this patch set to 5.10.168 stable release[1] and run
mm_regression[3] test cases following steps[4] provided by Naoya. All
four cases passed.

	#./run.sh project summary -p
	Project Name: debug
	PASS mm/hwpoison/shmem_link/link-hard.auto3
	PASS mm/hwpoison/shmem_link/link-sym.auto3
	PASS mm/hwpoison/shmem_rw/thp-always.auto3
	PASS mm/hwpoison/shmem_rw/thp-never.auto3
	Progress: 4 / 4 (100%)

Tested-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>

Cheers,
Shuai

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.10.168
[2] https://github.com/nhoriguchi/mm_regression
[3] https://lore.kernel.org/stable/20221116235842.GA62826@u2004/



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux