On Thu, May 12, 2022 at 1:07 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, May 12, 2022, Jon Kohler wrote: > > > > > > > On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > On Thu, May 12, 2022, Sean Christopherson wrote: > > >> On Thu, May 12, 2022, Jon Kohler wrote: > > >>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest > > >>> attack surface is already covered by switch_mm_irqs_off() -> > > >>> cond_mitigation(). > > >>> > > >>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply > > >>> wrong in its guest-to-guest design intention. There are three scenarios > > >>> at play here: > > >> > > >> Jim pointed offline that there's a case we didn't consider. When switching between > > >> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in > > >> different security domains. E.g. the guest will not get a notification that vCPU0 is > > >> being swapped out for vCPU1 on a single pCPU. > > >> > > >> So, sadly, after all that, I think the IBPB needs to stay. But the documentation > > >> most definitely needs to be updated. > > >> > > >> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like > > >> use cases where a single VM is running a single workload. > > > > > > Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs, > > > because then the IBPB is fully redundant with respect to any IBPB performed by > > > switch_mm_irqs_off(). Hrm, though it might need a KVM or per-VM knob, e.g. just > > > because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB. > > > > > > That would also sidestep the largely theoretical question of whether vCPUs from > > > different VMs but the same address space are in the same security domain. It doesn't > > > matter, because even if they are in the same domain, KVM still needs to do IBPB. > > > > So should we go back to the earlier approach where we have it be only > > IBPB on always_ibpb? Or what? > > > > At minimum, we need to fix the unilateral-ness of all of this :) since we’re > > IBPB’ing even when the user did not explicitly tell us to. > > I think we need separate controls for the guest. E.g. if the userspace VMM is > sufficiently hardened then it can run without "do IBPB" flag, but that doesn't > mean that the entire guest it's running is sufficiently hardened. > > > That said, since I just re-read the documentation today, it does specifically > > suggest that if the guest wants to protect *itself* it should turn on IBPB or > > STIBP (or other mitigations galore), so I think we end up having to think > > about what our “contract” is with users who host their workloads on > > KVM - are they expecting us to protect them in any/all cases? > > > > Said another way, the internal guest areas of concern aren’t something > > the kernel would always be able to A) identify far in advance and B) > > always solve on the users behalf. There is an argument to be made > > that the guest needs to deal with its own house, yea? > > The issue is that the guest won't get a notification if vCPU0 is replaced with > vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit > IBPB. Since the host doesn't know whether or not the guest wants IBPB, unless the > owner of the host is also the owner of the guest workload, the safe approach is to > assume the guest is vulnerable. Exactly. And if the guest has used taskset as its mitigation strategy, how is the host to know?