On Tue, Jun 23, 2020 at 8:23 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > > On 23/06/2020 14:03, Peter Zijlstra wrote: > > On Tue, Jun 23, 2020 at 02:12:37PM +0200, Joerg Roedel wrote: > >> On Tue, Jun 23, 2020 at 01:50:14PM +0200, Peter Zijlstra wrote: > >>> If SNP is the sole reason #VC needs to be IST, then I'd strongly urge > >>> you to only make it IST if/when you try and make SNP happen, not before. > >> It is not the only reason, when ES guests gain debug register support > >> then #VC also needs to be IST, because #DB can be promoted into #VC > >> then, and as #DB is IST for a reason, #VC needs to be too. > > Didn't I read somewhere that that is only so for Rome/Naples but not for > > the later chips (Milan) which have #DB pass-through? > > I don't know about hardware timelines, but some future part can now opt > in to having debug registers as part of the encrypted state, and swapped > by VMExit, which would make debug facilities generally usable, and > supposedly safe to the #DB infinite loop issues, at which point the > hypervisor need not intercept #DB for safety reasons. > > Its worth nothing that on current parts, the hypervisor can set up debug > facilities on behalf of the guest (or behind its back) as the DR state > is unencrypted, but that attempting to intercept #DB will redirect to > #VC inside the guest and cause fun. (Also spare a thought for 32bit > kernels which have to cope with userspace singlestepping the SYSENTER > path with every #DB turning into #VC.) What do you mean 32-bit? 64-bit kernels have exactly the same problem. At least the stack is okay, though. Anyway, since I'm way behind on this thread, here are some thoughts: First, I plan to implement actual precise recursion detection for the IST stacks. We'll be able to reliably panic when unallowed recursion happens. Second, I don't object *that* strongly to switching to a second #VC stack if an NMI or MCE happens, but we really need to make sure we cover *all* the bases. And #VC is distressingly close to "happens at all kinds of unfortunate times and the guest doesn't actually have much ability to predice it" right now. So we have #VC + #DB + #VC, #VC + NMI + #VC, #VC + MCE + #VC, and even worse options. So doing the shift in a reliable way is not necessarily possible in a clean way. Let me contemplate. And maybe produce some code soon.