Re: [RESEND][PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE

Borislav Petkov <bp@xxxxxxxxx> · Fri, 9 Aug 2024 10:34:00 +0200

On Fri, Aug 09, 2024 at 03:22:19AM +0200, Andrew Zaborowski wrote:
> I don't have a "real world" use case, we hit these two bugs in HW
> testing.

You inject MCEs or what testing do you mean here?

In what pages? I presume user...

So instead of the process getting killed, you want to return SIGBUS
because, "hey caller, your process encountered an MCE while being
attempted to be executed"?

> Qemu relies on the SIGBUS logic but the execve and rseq
> cases cannot be recovered from, the main benefit of sending the
> correct signal is perhaps information to the user.

You will have that info in the logs - we're usually very loud when we
get an MCE...

> If this cannot be fixed then optimally it should be documented.

I'm not convinced at all that jumping through hoops you're doing, is
worth the effort.

> As for "all that code", the memory failure handling code is of certain
> size and this is a comparatively tiny fix for a tiny issue.

No, I didn't say anything about the memory failure code - it is about
supporting that obscure use case and the additional logic you're adding
to the #MC handler which looks like a real mess already and us having to
support that use case indefinitely.

So why does it matter if a process which is being executed and gets an
MCE beyond the point of no return absolutely needs to return SIGBUS vs
it getting killed and you still get an MCE logged on the machine, in
either case?

I mean, I would understand it when the parent process can do something
meaningful about it but if not, why does it matter at all?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette