Re: [PATCH 3/3] KVM: VMX: Extend VMX's #AC interceptor to handle split lock #AC in guest

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Thu, 2 Apr 2020 13:51:09 -0700

On Thu, Apr 02, 2020 at 10:07:07PM +0200, Thomas Gleixner wrote:
> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:
> > On Thu, Apr 02, 2020 at 07:19:44PM +0200, Thomas Gleixner wrote:
> >> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:
> > That puts KVM in a weird spot if/when intercepting #AC is no longer
> > necessary, e.g. "if" future CPUs happen to gain a feature that traps into
> > the hypervisor (KVM) if a potential near-infinite ucode loop is detected.
> >
> > The only reason KVM intercepts #AC (before split-lock) is to prevent a
> > malicious guest from executing a DoS attack on the host by putting the #AC
> > handler in ring 3.  Current CPUs will get stuck in ucode vectoring #AC
> > faults more or less indefinitely, e.g. long enough to trigger watchdogs in
> > the host.
> 
> Which is thankfully well documented in the VMX code and the
> corresponding chapter in the SDM. 
> 
> > Injecting #AC if and only if KVM is 100% certain the guest wants the #AC
> > would lead to divergent behavior if KVM chose to not intercept #AC, e.g.
> 
> AFAICT, #AC is not really something which is performance relevant, but I
> might obviously be uninformed on that.
> 
> Assumed it is not, then there is neither a hard requirement nor a real
> incentive to give up on intercepting #AC even when future CPUs have a
> fix for the above wreckage.

Agreed that there's no hard requirement, but general speaking, the less KVM
needs to poke into the guest the better.

> > some theoretical unknown #AC source would conditionally result in exits to
> > userspace depending on whether or not KVM wanted to intercept #AC for
> > other reasons.
> 
> I'd rather like to know when there is an unknown #AC source instead of
> letting the guest silently swallow it.

Trying to prevent the guest from squashing a spurious fault is a fools
errand.   For example, with nested virtualization, the correct behavior
from an architectural perspective is to forward exceptions from L2 (the
nested VM) to L1 (the direct VM) that L1 wants to intercept.  E.g. if L1
wants to intercept #AC faults that happen in L2, then KVM reflects all #AC
faults into L1 as VM-Exits without ever reaching this code.

Nested virt aside, detecting spurious #AC and a few other exceptions is
mostly feasible, but for many exceptions it's flat out impossible.

Anyways, this particular case isn't a sticking point, i.e. I'd be ok with
exiting to userspace on a spurious #AC, I just don't see the value in doing
so.  Returning KVM_EXIT_EXCEPTION doesn't necessarily equate to throwing up
a red flag, e.g. from a kernel perspective you'd still be relying on the
userspace VMM to report the error in a sane manner.  I think at one point
Xiaoyao had a WARN_ON for a spurious #AC, but it was removed because the
odds of a false positive due to some funky corner case seemed higher than
detecting a CPU bug.

> TBH, the more I learn about this, the more I tend to just give up on
> this whole split lock stuff in its current form and wait until HW folks
> provide something which is actually usable:
> 
>    - Per thread
>    - Properly distinguishable from a regular #AC via error code
> 
> OTOH, that means I won't be able to use it before retirement. Oh well.
> 
> Thanks,
> 
>         tglx