On Tue, Jan 17, 2023 at 10:34 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > > For SRAO signaled via **machine check exception**, my reading of the > > current x86 MCE code is this: > ... > > 3) therefore, do_machine_check just skips kill_me_now or > > kill_me_maybe, and directly goto out: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/mce/core.c#n1539 > > That does appear to be what we do. But it looks like a regression from older > behavior. An SRAO machine check *ought* to call memory_failure() without > the MF_ACTION_REQUIRED bit set in flags. > > -Tony > Oh, maybe SRAO signaled via MCE calls memory_failure() with these async code paths? 1. __mc_scan_banks => mce_log => mce_gen_pool_add + irq_work_queue(mce_irq_work) 2. mce_irq_work_cb => mce_schedule_work => schedule_work(&mce_work) 3. mce_work => mce_gen_pool_process => blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce) => mce_uc_nb => uc_decode_notifier => memory_failure