On Fri, May 26, 2023 at 02:32:42PM +0800, Kefeng Wang wrote: > The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy, > then let the core do_machine_check() to isolate corrupted page instead > of doing it one-by-one. No, this whole thing is confused. * Indicates an MCE that happened in kernel space while copying data * from user. #define MCE_IN_KERNEL_COPYIN This is a very specific exception type: EX_TYPE_COPY which got added by 278b917f8cb9 ("x86/mce: Add _ASM_EXTABLE_CPY for copy user access") but Linus then removed all such user copy exception points in 034ff37d3407 ("x86: rewrite '__copy_user_nocache' function") So now that EX_TYPE_COPY never happens. And what you're doing is lumping the handling for EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE together and saying that the MCE happened while copying data from user. And XSTATE_OP() is one example where this is not really the case. So no, this is not correct. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette