On 2022/12/10 8:50, Andrew Morton wrote:
On Fri, 9 Dec 2022 15:28:01 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
When the kernel copy a page from ksm_might_need_to_copy(), but runs
into an uncorrectable error, it will crash since poisoned page is
consumed by kernel, this is similar to Copy-on-write poison recovery,
When an error is detected during the page copy, return VM_FAULT_HWPOISON,
which help us to avoid system crash. Note, memory failure on a KSM
page will be skipped, but still call memory_failure_queue() to be
consistent with general memory failure process.
Thanks. Sorry, lots of paperwork and bureaucracy:
Is a copy of the oops(?) output available?
Did someone else report this? If so, is a Reported-by available for
that? And a Link: for the Reported-by:, which is a coming thing.
Can we identify a Fixes: target?
Is a cc:stable appropriate?
We are trying to support ARCH_HAS_COPY_MC on arm64[1] and trying to
recover from CoW faults[2],
also tony do the same thing(recover from CoW) on X86[3]. The kernel copy
in ksm_might_need_to_copy()
could recover, this is an enhance of COPY_MC, so I think no need to add
Fixes and stable.
Thanks.
[1]
https://lore.kernel.org/linux-arm-kernel/20220812070557.1028499-1-tongtiangen@xxxxxxxxxx/
[2]
https://lore.kernel.org/linux-arm-kernel/20220812070557.1028499-5-tongtiangen@xxxxxxxxxx/
[3]
https://lore.kernel.org/lkml/20221031201029.102123-2-tony.luck@xxxxxxxxx/