On 2024/10/4 7:51, Jiaqi Yan wrote: > Hi Jane, > > On Wed, Oct 2, 2024 at 4:50 PM <jane.chu@xxxxxxxxxx> wrote: >> >> Hi, >> >> On 9/23/2024 9:39 PM, Jiaqi Yan wrote: >>> >>> + /* >>> + * On ARM64, if APEI failed to claims SEA, (e.g. GHES driver doesn't >>> + * register to SEA notifications from firmware), memory_failure will >>> + * never be synchrounous to the error consumption thread. Notifying >>> + * it via SIGBUS synchrnously has to be done by either core kernel in >>> + * do_mem_abort, or KVM in kvm_handle_guest_abort. >>> + */ >>> + if (!sysctl_enable_hard_offline) { >>> + pr_info_once("%#lx: disabled by /proc/sys/vm/enable_hard_offline\n", pfn); >>> + kill_procs_now(p, pfn, flags, page_folio(p)); >>> + res = -EOPNOTSUPP; >>> + goto unlock_mutex; >>> + } >>> + >> >> I am curious why the SIGBUS is sent without setting PG_hwpoison in the >> page. In 0/2 there seems to be indication about threads coordinate >> with each other such that clean subpages in a poisoned hugetlb page >> continue to be accessible, and at some point, (or perhaps I misread), >> the poisoned page (sub- or huge-) will eventually be isolated, because, > > The code here is "global policy". The "per-VMA policy", proposed in > 0/2 but code not sent, should be able to support isolation + offline > at some point (all VMAs are gone and page becomes free). > >> it's unthinkable to let a poisoned page laying around and kernel treats >> it like a clean page ? But I'm not sure how do you plan to handle it >> without PG_hwpoison while hard_offline is disabled globally. > > It will become the responsibility of a control plan running in > userspace. For example, the control plan immediately prevents starting > of any new workload/VM, but chooses to wait until memory errors exceed > a certain threshold, or hold on to the hosts until all workloads/VMs > are migrated and then repair the machine. Not setting PG_hwpoison is > indeed a big difference and risk, so it needs to be carefully handled > by userspace. > Could you explain why PG_hwpoison cannot be set in this case? It seems a control plan running in userspace can work with PG_hwpoison set. PG_hwpoison makes sure hwpoisoned pages won't be re-used by kernel while the control plan prevent them from re-accessed from userspace. Or am I miss something? Thanks. .