> On May 12, 2022, at 4:27 PM, Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > On Thu, May 12, 2022 at 1:07 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: >> >> On Thu, May 12, 2022, Jon Kohler wrote: >>> >>> >>>> On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: >>>> >>>> On Thu, May 12, 2022, Sean Christopherson wrote: >>>>> On Thu, May 12, 2022, Jon Kohler wrote: >>>>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest >>>>>> attack surface is already covered by switch_mm_irqs_off() -> >>>>>> cond_mitigation(). >>>>>> >>>>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply >>>>>> wrong in its guest-to-guest design intention. There are three scenarios >>>>>> at play here: >>>>> >>>>> Jim pointed offline that there's a case we didn't consider. When switching between >>>>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in >>>>> different security domains. E.g. the guest will not get a notification that vCPU0 is >>>>> being swapped out for vCPU1 on a single pCPU. >>>>> >>>>> So, sadly, after all that, I think the IBPB needs to stay. But the documentation >>>>> most definitely needs to be updated. >>>>> >>>>> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like >>>>> use cases where a single VM is running a single workload. >>>> >>>> Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs, >>>> because then the IBPB is fully redundant with respect to any IBPB performed by >>>> switch_mm_irqs_off(). Hrm, though it might need a KVM or per-VM knob, e.g. just >>>> because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB. >>>> >>>> That would also sidestep the largely theoretical question of whether vCPUs from >>>> different VMs but the same address space are in the same security domain. It doesn't >>>> matter, because even if they are in the same domain, KVM still needs to do IBPB. >>> >>> So should we go back to the earlier approach where we have it be only >>> IBPB on always_ibpb? Or what? >>> >>> At minimum, we need to fix the unilateral-ness of all of this :) since we’re >>> IBPB’ing even when the user did not explicitly tell us to. >> >> I think we need separate controls for the guest. E.g. if the userspace VMM is >> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't >> mean that the entire guest it's running is sufficiently hardened. >> >>> That said, since I just re-read the documentation today, it does specifically >>> suggest that if the guest wants to protect *itself* it should turn on IBPB or >>> STIBP (or other mitigations galore), so I think we end up having to think >>> about what our “contract” is with users who host their workloads on >>> KVM - are they expecting us to protect them in any/all cases? >>> >>> Said another way, the internal guest areas of concern aren’t something >>> the kernel would always be able to A) identify far in advance and B) >>> always solve on the users behalf. There is an argument to be made >>> that the guest needs to deal with its own house, yea? >> >> The issue is that the guest won't get a notification if vCPU0 is replaced with >> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit >> IBPB. Since the host doesn't know whether or not the guest wants IBPB, unless the >> owner of the host is also the owner of the guest workload, the safe approach is to >> assume the guest is vulnerable. > > Exactly. And if the guest has used taskset as its mitigation strategy, > how is the host to know? Yea thats fair enough. I posed a solution on Sean’s response just as this email came in, would love to know your thoughts (keying off MSR bitmap).