> On Feb 19, 2020, at 2:33 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > >> >> One big question here: are memory failure #MC exceptions synchronous >> or can they be delayed? If we get a memory failure, is it possible >> that the #MC hits some random context and not the actual context where >> the error occurred? > > There are a few cases: > 1) SRAO (Software recoverable action optional) [Patrol scrub or L3 cache eviction] > These aren't synchronous with any core execution. Using machine check to signal > was probably a mistake - compounded by it being broadcast :-( Could pick any CPU > to handle (actually choose the first to arrive in do_machine_check()). That guy should > arrange to soft offline the affected page. Every CPU can return to what they were doing > before. You could handle this by sending IPI-to-self and dealing with it in the interrupt handler. Or even wake a high-priority kthread or workqueue. irq_work may help. Relying on task_work or the non_atomic stuff seems silly - you can’t rely on anything about the interrupted context, and the context is more or less irrelevant anyway. > > 2) SRAR (Software recoverable action required) > These are synchronous. Starting with Skylake they may be signaled just to the thread > that hit the poison. Earlier generations broadcast. Here’s where dealing with one that came from kernel code is just nasty, right? I would argue that, if IF=0, killing the machine is reasonable. If IF=1, we should be okay. Actually making this work sanely is gross, and arguably the goal should be minimizing grossness. Perhaps, if we came from kernel mode, we should IPI-to-self and use a special vector that is idtentry, not apicinterrupt. Or maybe even do this for entries from usermode just to keep everything consistent. > 2a) Hit in ring3 code ... we want to offline the page and SIGBUS the task(s) > 2b) Memcpy_mcsafe() ... kernel has a recovery path. "Return" to the recovery code instead of to the original RIP. > 2c) copy_from_user ... not implemented yet. We are in kernel, but would like to treat this like case 2a > > 3) Fatal > Always broadcast. Some bank has MCi_STATUS.PCC==1. System must be shutdown. Easy :) It would be really, really nice if NMI was masked in MCE context. > > -Tony