Hi,
On 9/23/2024 9:39 PM, Jiaqi Yan wrote:
+ /*
+ * On ARM64, if APEI failed to claims SEA, (e.g. GHES driver doesn't
+ * register to SEA notifications from firmware), memory_failure will
+ * never be synchrounous to the error consumption thread. Notifying
+ * it via SIGBUS synchrnously has to be done by either core kernel in
+ * do_mem_abort, or KVM in kvm_handle_guest_abort.
+ */
+ if (!sysctl_enable_hard_offline) {
+ pr_info_once("%#lx: disabled by /proc/sys/vm/enable_hard_offline\n", pfn);
+ kill_procs_now(p, pfn, flags, page_folio(p));
+ res = -EOPNOTSUPP;
+ goto unlock_mutex;
+ }
+
I am curious why the SIGBUS is sent without setting PG_hwpoison in the
page. In 0/2 there seems to be indication about threads coordinate
with each other such that clean subpages in a poisoned hugetlb page
continue to be accessible, and at some point, (or perhaps I misread),
the poisoned page (sub- or huge-) will eventually be isolated, because,
it's unthinkable to let a poisoned page laying around and kernel treats
it like a clean page ? But I'm not sure how do you plan to handle it
without PG_hwpoison while hard_offline is disabled globally.
Another thing I'm curious at is whether you have tested with real
hardware UE - the one that triggers MCE. When a real UE is consumed by
the training process, the user process must longjmp out in order to
avoid getting stuck at the same instruction that fetched a UE memory.
Given a longjmp is needed (unless I am missing something), the training
process is already in a situation where it has to figure out things like
rewind, where-to-restart-from, does it even keep states? etc. On the
whole, whether the burden to ask user application to deal with what's
lacking in the kernel, namely the lack of splitting up a hugetlb page,
is worthwhile, is something that need to be weighed over.
Thanks,
-jane