> On May 10, 2022, at 10:22 AM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Sat, Apr 30, 2022, Jon Kohler wrote: >> >>> On Apr 30, 2022, at 5:50 AM, Borislav Petkov <bp@xxxxxxxxx> wrote: >>> So let me try to understand this use case: you have a guest and a bunch >>> of vCPUs which belong to it. And that guest gets switched between those >>> vCPUs and KVM does IBPB flushes between those vCPUs. >>> >>> So either I'm missing something - which is possible - but if not, that >>> "protection" doesn't make any sense - it is all within the same guest! >>> So that existing behavior was silly to begin with so we might just as >>> well kill it. >> >> Close, its not 1 guest with a bunch of vCPU, its a bunch of guests with >> a small amount of vCPUs, thats the small nuance here, which is one of >> the reasons why this was hard to see from the beginning. >> >> AFAIK, the KVM IBPB is avoided when switching in between vCPUs >> belonging to the same vmcs/vmcb (i.e. the same guest), e.g. you could >> have one VM highly oversubscribed to the host and you wouldn’t see >> either the KVM IBPB or the switch_mm IBPB. All good. > > No, KVM does not avoid IBPB when switching between vCPUs in a single VM. Every > vCPU has a separate VMCS/VMCB, and so the scenario described above where a single > VM has a bunch of vCPUs running on a limited set of logical CPUs will emit IBPB > on every single switch. Ah! Right, ok thanks for helping me get my wires uncrossed there, I was getting confused from the nested optimization made on 5c911beff KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02 So the only time we’d *not* issue IBPB is if the current per-vcpu vmcs/vmcb is still loaded in the non-nested case, or between guests in the nested case. Walking through my thoughts again here with this fresh in my mind: In that example, say guest A has vCPU0 and vCPU1 and has to switch in between the two on the same pCPU, it isn’t doing a switch_mm() because its the same mm_struct; however, I’d wager to say that if you had an attacker on the guest VM, executing an attack on vCPU0 with the intent of attacking vCPU1 (which is up to run next), you have far bigger problems, as that would imply the guest is completely compromised, so why would they even waste time on a complex prediction attack when they have that level of system access in the first place? Going back to the original commit documentation that Boris called out, that specifically says: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. Since you need to go through switch_mm() to change mm_struct from Guest A to Guest B, it makes no sense to issue the barrier in KVM, as the kernel is already giving that “for free” (from KVM’s perspective) as the guest-to-guest transition is already covered by cond_mitigation(). That would apply equally for switches within both nested and non-nested cases, because switch_mm needs to be called between guests.