On Fri, Aug 09, 2024 at 03:22:19AM +0200, Andrew Zaborowski wrote: > I don't have a "real world" use case, we hit these two bugs in HW > testing. You inject MCEs or what testing do you mean here? In what pages? I presume user... So instead of the process getting killed, you want to return SIGBUS because, "hey caller, your process encountered an MCE while being attempted to be executed"? > Qemu relies on the SIGBUS logic but the execve and rseq > cases cannot be recovered from, the main benefit of sending the > correct signal is perhaps information to the user. You will have that info in the logs - we're usually very loud when we get an MCE... > If this cannot be fixed then optimally it should be documented. I'm not convinced at all that jumping through hoops you're doing, is worth the effort. > As for "all that code", the memory failure handling code is of certain > size and this is a comparatively tiny fix for a tiny issue. No, I didn't say anything about the memory failure code - it is about supporting that obscure use case and the additional logic you're adding to the #MC handler which looks like a real mess already and us having to support that use case indefinitely. So why does it matter if a process which is being executed and gets an MCE beyond the point of no return absolutely needs to return SIGBUS vs it getting killed and you still get an MCE logged on the machine, in either case? I mean, I would understand it when the parent process can do something meaningful about it but if not, why does it matter at all? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette