On Tue, Jun 23, 2020 at 04:49:40PM +0200, Joerg Roedel wrote: > > We're talking about the 3rd case where the only reason things 'work' is > > because we'll have to panic(): > > > > - #MC > > Okay, #MC is special and can only be handled on a best-effort basis, as > #MC could happen anytime, also while already executing the #MC handler. I think the hardware has a MCE-mask bit somewhere. Flaky though because clearing it isn't 'atomic' with IRET, so there's a 'funny' window. It also interacts really bad with the NMI handler. If we get an #MC early in the NMI, where we hard-rely on the NMI-mask being set to set-up the recursion stack, then the #MC IRET will clear the NMI-mask, and we're toast. Andy has wild and crazy ideas, but I don't think we need more crazy here. #VC SNP has a similar problem vs NMI, that needs to die() irrespective of the #VC IST recursion. > > - #DB with BUS LOCK DEBUG EXCEPTION > > If I understand the problem correctly, this can be solved by moving off > the IST stack to the current task stack in the #DB handler, like I plan > to do for #VC, no? Hmm, probably. Would take a bit of care, but should be doable. > > - #VC SNP > > This has to panic for other reasons that can't be worked around. It > boils down to detecting that the HV is doing something fishy and bail > out to avoid further harm (like in the #MC handler). Right, but it doesn't take away that IST-any-time vectors are fundamentally screwy. Both the MCE and NMI have masks that are, as per the above, differently funny, but the other ISTs do not. Also, even if they had masks, the interaction between them is still screwy. #VC would've been so much better if it would've had a mask bit somewhere, then at least we could've had the exception entry covered. Another #VC with the mask set should probably result in #DF or Shutdown, but that's all water under the bridge I suspect.