Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching

Oliver Upton <oupton@xxxxxxxxxx> · Mon, 28 Mar 2022 17:28:17 +0000

On Fri, Mar 25, 2022 at 11:53:05PM +0000, Sean Christopherson wrote:
> On Thu, Mar 24, 2022, Oliver Upton wrote:
> > On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote:
> > > On 3/24/22 18:44, Sean Christopherson wrote:
> > > > On Wed, Mar 16, 2022, Oliver Upton wrote:
> > > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> > > > > both of these instructions really should #UD when executed on the wrong
> > > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> > > > > guest's instruction with the appropriate instruction for the vendor.
> > > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> > > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> > > > > do not patch in the appropriate instruction using alternatives, likely
> > > > > motivating KVM's intervention.
> > > > > 
> > > > > Add a quirk allowing userspace to opt out of hypercall patching.
> > > > 
> > > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> > > > intentional.
> > > > 
> > > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@xxxxxxxxxx
> > > 
> > > It's intentional, but the days of the patching part are over.
> > > 
> > > KVM should just call emulate_hypercall if called with the wrong opcode
> > > (which in turn can be quirked away).  That however would be more complex to
> > > implement because the hypercall path wants to e.g. inject a #UD with
> > > kvm_queue_exception().
> > > 
> > > All this makes me want to just apply Oliver's patch.
> > > 
> > > > > +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> > > > > +		ctxt->exception.error_code_valid = false;
> > > > > +		ctxt->exception.vector = UD_VECTOR;
> > > > > +		ctxt->have_exception = true;
> > > > > +		return X86EMUL_PROPAGATE_FAULT;
> > > > 
> > > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> > > > will also end up generating a #UD in most cases, but will play nice with
> > > > KVM_CAP_EXIT_ON_EMULATION_FAILURE.
> > 
> > Sean and I were looking at this together right now, and it turns out
> > that returning X86EMUL_UNHANDLEABLE at this point will unconditionally
> > bounce out to userspace.
> > 
> > x86_decode_emulated_instruction() would need to be the spot we bail to
> > guard these exits with the CAP.
> 
> I've spent waaay too much time thinking about this...
> 
> TL;DR: I'm in favor of applying the patch as-is.
> 
> My objection to manually injecting the #UD is that there's no guarantee that KVM
> is actually handling a #UD.  It's largely a moot point, as KVM barfs on VMCALL/VMMCALL
> if it's _not_ from a #UD, e.g. KVM hangs the guest if it does a VMCALL when KVM is
> emulating due to lack of unrestricted guest.  Forcing #UD is probably a net positive
> in that case, as it will cause KVM to ultimately fail with "emulation error" and
> bail to userspace.
> 
> It is possible to handle this in decode (idea below), but it will set us up for
> pain later.  If KVM ever does gain support for truly emulating hypercall

There was another annoyance that motivated me to sidestep emulation
altogether.

'Correct' emulation (or whatever we decide to call what KVM does) of the hypercall
instruction would require that we actually inform the emulator about nested for
both vendor calls. And by that I mean both {svm,vmx}_check_intercept would need
to correctly handle both VMCALL/VMMCALL. The one nice thing about hypercall
patching is that we could keep L1 oblivious, as we would have already rewritten
the instruction before reflecting the exit to L1.

While I was looking at #UD under nested for this issue, I noticed:

Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
on #UD, even though it is possible that L0 was emulating an instruction not
present in hardware (like RDPID). If L1 passed through RDPID the #UD
should not be reflected to L1. I believe this would require that we make
the emulator aware of nVMX which sounds like a science project on its
own.

Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)

--
Thanks,
Oliver