On Mon, Jan 11, 2021 at 01:44:50PM -0800, Tony Luck wrote: > @@ -1431,8 +1433,11 @@ noinstr void do_machine_check(struct pt_regs *regs) > mce_panic("Failed kernel mode recovery", &m, msg); > } > > - if (m.kflags & MCE_IN_KERNEL_COPYIN) > + if (m.kflags & MCE_IN_KERNEL_COPYIN) { > + if (current->mce_busy) > + mce_panic("Multiple copyin", &m, msg); So this: we're currently busy handling the first MCE, why do we must panic? Can we simply ignore all follow-up MCEs to that page? I.e., the page will get poisoned eventually and that poisoning is currently executing so all following MCEs are simply nothing new and we can ignore them. It's not like we're going to corrupt more data - we already are "corrupting" whole 4K. Am I making sense? Because if we do this, we won't have to pay attention to any get_user() callers and whatnot - we simply ignore and the solution is simple and you won't have to touch any get_user() callers... Hmmm? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette