On Thu, Feb 18, 2021 at 04:28:36PM -0800, Andy Lutomirski wrote: > On Thu, Feb 18, 2021 at 11:21 AM Joerg Roedel <jroedel@xxxxxxx> wrote: > Can you give me an example, even artificial, in which the linked-list > logic is useful? So here we go, its of course artificial, but still: 1. #VC happens, not important where 2. NMI in the #VC prologue before it moved off its IST stack - first VC IST adjustment happening here 3. #VC in the NMI handler 4. #HV in the #VC prologue again - second VC IST adjustment happening here, so the #HV handler can cause its own #VC exceptions. Can only happen if the #HV handler is allowed to cause #VC exceptions. But even if its not allowed, it can happen with SNP and a malicious Hypervisor. But in this case the only option is to reliably panic. > Can you explain your reasoning in considering the entry stack unsafe? > It's 4k bytes these days. I wasn't aware that it is 4k in size now. I still thought it was just these 64 words large and one can not simply execute C code on it. > You forgot about entry_SYSCALL_compat. Right, thanks for pointing this out. > Your 8-byte alignment is confusing to me. In valid kernel code, SP > should be 8-byte-aligned already, and, if you're trying to match > architectural behavior, the CPU aligns to 16 bytes. Yeah, I was just being cautious. The explicit alignment can be removed, Boris also pointed this out. > We're not robust against #VC, NMI in the #VC prologue before the magic > stack switch, and a new #VC in the NMI prologue. Nor do we appear to > have any detection of the case where #VC nests directly inside its own > prologue. Or did I miss something else here? No, you don't miss anything here. At the moment #VC can't happen at those places, so this is not handled yet. With SNP it can happen and needs to be handled in a way to at least allow a reliable panic (because if it really happens the Hypervisor is messing with us). > If we get NMI and get #VC in the NMI *asm*, the #VC magic stack switch > looks like it will merrily run itself in the NMI special-stack-layout > section, and that sounds really quite bad. Yes, I havn't looked at the details yet, but if a #VC happens there it probably better not returns. > I mean that, IIRC, a malicious hypervisor can inject inappropriate > vectors at inappropriate times if the #HV mechanism isn't enabled. > For example, it could inject a page fault or an interrupt in a context > in which we have the wrong GSBASE loaded. Yes, a malicious Hypervisor can do that, and without #HV there is no real protection against this besides turning all vectors (even IRQs) into paranoid entries. Maybe even more care is needed, but I think its not worth to care about this. > But the #DB issue makes this moot. We have to use IST unless we turn > off SCE. But I admit I'm leaning toward turning off SCE until we have > a solution that seems convincingly robust. Turning off SCE might be tempting, but I guess doing so would break a quite some user-space code, no? Regards, Joerg