Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery

"Luck, Tony" <tony.luck@xxxxxxxxx> · Tue, 12 Jan 2021 17:50:53 -0800

On Tue, Jan 12, 2021 at 02:04:55PM -0800, Andy Lutomirski wrote:
> > But we know that the fault happend in a get_user() or copy_from_user() call
> > (i.e. an RIP with an extable recovery address).  Does context switch
> > access user memory?
> 
> No, but NMI can.
> 
> The case that would be very very hard to deal with is if we get an NMI just before IRET/SYSRET and get #MC inside that NMI.
> 
> What we should probably do is have a percpu list of pending memory failure cleanups and just accept that we’re going to sometimes get a second MCE (or third or fourth) before we can get to it.
> 
> Can we do the cleanup from an interrupt?  IPI-to-self might be a credible approach, if so.

You seem to be looking for a solution that is entirely contained within
the machine check handling code. Willing to allow for repeated machine
checks from the same poison address in order to achieve that.

I'm opposed to mutliple machine checks.  Willing to make some changes
in core code to avoid repeated access to the same poison location.

We need a tie-breaker.

-Tony