Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load

Jim Mattson <jmattson@xxxxxxxxxx> · Thu, 12 May 2022 13:27:23 -0700

On Thu, May 12, 2022 at 1:07 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, May 12, 2022, Jon Kohler wrote:
> >
> >
> > > On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Thu, May 12, 2022, Sean Christopherson wrote:
> > >> On Thu, May 12, 2022, Jon Kohler wrote:
> > >>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
> > >>> attack surface is already covered by switch_mm_irqs_off() ->
> > >>> cond_mitigation().
> > >>>
> > >>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
> > >>> wrong in its guest-to-guest design intention. There are three scenarios
> > >>> at play here:
> > >>
> > >> Jim pointed offline that there's a case we didn't consider.  When switching between
> > >> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
> > >> different security domains.  E.g. the guest will not get a notification that vCPU0 is
> > >> being swapped out for vCPU1 on a single pCPU.
> > >>
> > >> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
> > >> most definitely needs to be updated.
> > >>
> > >> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
> > >> use cases where a single VM is running a single workload.
> > >
> > > Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
> > > because then the IBPB is fully redundant with respect to any IBPB performed by
> > > switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
> > > because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
> > >
> > > That would also sidestep the largely theoretical question of whether vCPUs from
> > > different VMs but the same address space are in the same security domain.  It doesn't
> > > matter, because even if they are in the same domain, KVM still needs to do IBPB.
> >
> > So should we go back to the earlier approach where we have it be only
> > IBPB on always_ibpb? Or what?
> >
> > At minimum, we need to fix the unilateral-ness of all of this :) since we’re
> > IBPB’ing even when the user did not explicitly tell us to.
>
> I think we need separate controls for the guest.  E.g. if the userspace VMM is
> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
> mean that the entire guest it's running is sufficiently hardened.
>
> > That said, since I just re-read the documentation today, it does specifically
> > suggest that if the guest wants to protect *itself* it should turn on IBPB or
> > STIBP (or other mitigations galore), so I think we end up having to think
> > about what our “contract” is with users who host their workloads on
> > KVM - are they expecting us to protect them in any/all cases?
> >
> > Said another way, the internal guest areas of concern aren’t something
> > the kernel would always be able to A) identify far in advance and B)
> > always solve on the users behalf. There is an argument to be made
> > that the guest needs to deal with its own house, yea?
>
> The issue is that the guest won't get a notification if vCPU0 is replaced with
> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
> IBPB.  Since the host doesn't know whether or not the guest wants IBPB, unless the
> owner of the host is also the owner of the guest workload, the safe approach is to
> assume the guest is vulnerable.

Exactly. And if the guest has used taskset as its mitigation strategy,
how is the host to know?