Re: [RFC PATCH v1 1/2] mm/memory-failure: introduce global MFR policy

jane.chu@xxxxxxxxxx · Wed, 2 Oct 2024 16:50:43 -0700

Hi,

On 9/23/2024 9:39 PM, Jiaqi Yan wrote:

+	/*
+	 * On ARM64, if APEI failed to claims SEA, (e.g. GHES driver doesn't
+	 * register to SEA notifications from firmware), memory_failure will
+	 * never be synchrounous to the error consumption thread. Notifying
+	 * it via SIGBUS synchrnously has to be done by either core kernel in
+	 * do_mem_abort, or KVM in kvm_handle_guest_abort.
+	 */
+	if (!sysctl_enable_hard_offline) {
+		pr_info_once("%#lx: disabled by /proc/sys/vm/enable_hard_offline\n", pfn);
+		kill_procs_now(p, pfn, flags, page_folio(p));
+		res = -EOPNOTSUPP;
+		goto unlock_mutex;
+	}
+

I am curious why the SIGBUS is sent without setting PG_hwpoison in the 
page.   In 0/2 there seems to be indication about threads coordinate 
with each other such that clean subpages in a poisoned hugetlb page 
continue to be accessible, and at some point, (or perhaps I misread), 
the poisoned page (sub- or huge-) will eventually be isolated, because, 
it's unthinkable to let a poisoned page laying around and kernel treats 
it like a clean page ?  But I'm not sure how do you plan to handle it 
without PG_hwpoison while hard_offline is disabled globally.

Another thing I'm curious at is whether you have tested with real 
hardware UE - the one that triggers MCE.  When a real UE is consumed by 
the training process, the user process must longjmp out in order to 
avoid getting stuck at the same instruction that fetched a UE memory.  
Given a longjmp is needed (unless I am missing something), the training 
process is already in a situation where it has to figure out things like 
rewind, where-to-restart-from, does it even keep states? etc. On the 
whole, whether the burden to ask user application to deal with what's 
lacking in the kernel, namely the lack of splitting up a hugetlb page, 
is worthwhile, is something that need to be weighed over.

Thanks,

-jane